Fact-checked by Grok 2 weeks ago

Supercomputer

A supercomputer is a high-performance computing system comprising thousands of interconnected processors and nodes that operate in parallel to execute computationally intensive tasks at speeds orders of magnitude greater than general-purpose computers, with performance typically benchmarked in floating-point operations per second (FLOPS). These machines emerged in the 1960s, with the Control Data Corporation (CDC) 6600, designed by Seymour Cray, recognized as the first true supercomputer capable of up to 3 million instructions per second, revolutionizing scientific simulations previously limited by computational power. Key milestones include the Cray-1 in 1976, which introduced vector processing and achieved peak speeds of 160 megaFLOPS, and subsequent vector and massively parallel architectures that propelled advancements in fields like aerodynamics, nuclear weapons modeling, and weather prediction. Modern supercomputers, ranked biannually by the TOP500 list using the High-Performance LINPACK benchmark, have reached exascale performance—over one quintillion FLOPS—with El Capitan at Lawrence Livermore National Laboratory holding the top position as of June 2025 at approximately 1.742 exaFLOPS Rmax. They enable breakthroughs such as protein folding simulations for drug discovery, climate modeling for environmental forecasting, and astrophysical computations, though their massive energy demands—often exceeding 20 megawatts—highlight ongoing challenges in efficiency and scalability.

Definition and Characteristics

Core Attributes and Scale

A supercomputer constitutes a system engineered to achieve peak computational throughput for tackling intricate, data-intensive simulations and optimizations beyond the capacity of standard commodity hardware. Its efficacy hinges on sustained floating-point operations per second (), a metric prioritizing arithmetic intensity over instruction counts, with contemporary exemplars registering petaFLOPS (10^{15} ) or higher on the High-Performance Linpack benchmark, which evaluates dense linear algebra solvability under realistic memory constraints. This throughput derives from causal necessities in domains demanding iterative matrix manipulations or integrations, where sequential processing yields prohibitive latencies. Fundamental attributes encompass massive parallelism, orchestrating cooperative execution across thousands to millions of cores or processors to partition workloads into concurrent subtasks, thereby amortizing overheads intrinsic to synchronization and load imbalance. Complementing this are high-speed interconnects, featuring sub-microsecond latencies and terabit-per-second aggregate bandwidths via specialized fabrics like or proprietary topologies (e.g., or ), which mitigate communication bottlenecks that would otherwise cap effective scalability in distributed-memory paradigms. Fault-tolerant architectures further underpin reliability, incorporating hardware redundancy, error-correcting codes, and software-level checkpointing to counteract mean-time-between-failures dropping to hours in node counts exceeding 100,000, ensuring mission-critical uptime without recalculating from inception. Scale manifests in modular node aggregation, routinely spanning tens of thousands of compute units with petabytes of aggregate memory, calibrated to thresholds where incremental additions preserve near-linear speedup per bounds. Empirically, historical delineation from high-end clusters emerged around sustained 1 petaFLOPS capabilities 2008, reflecting the onset of petascale viability for grand-challenge problems; present-day demands exaFLOPS regimes to outpace commoditized GPU clusters in , bandwidth-bound kernels. Vectorizable sets, enabling SIMD of dense operations, remain a causal enabler, amplifying throughput by factors of 4–64x over scalar baselines in floating-point dominant codes.

Differentiation from Standard Computers

Supercomputers differ from standard computers and commodity data center clusters primarily in their , which is engineered for extreme in (HPC) workloads rather than versatility for general-purpose tasks. While standard servers prioritize low-latency responses for interactive applications, such as web serving or database queries, and rely on off-the-shelf Ethernet interconnects with latencies often exceeding 10 microseconds, supercomputers employ specialized fabrics like or proprietary networks delivering sub-microsecond latencies and bandwidths over 200 Gbps per link to minimize communication bottlenecks in massively parallel environments. This tight integration, as seen in massively parallel processing () systems like IBM's Blue Gene series, ensures nodes are optimized for collective operations rather than independent execution, contrasting with loosely coupled commodity clusters where nodes can operate standalone for diverse, less synchronized tasks. Causally, these design choices stem from the demands of compute-bound, irregular parallelism in HPC, such as (CFD) simulations, which require frequent, fine-grained data exchanges across thousands of processes to resolve complex dependencies like . Standard computers, geared toward sequential execution or jobs (e.g., independent ), suffice for such tasks via higher-level abstractions but incur prohibitive overheads in tightly coupled scenarios due to slower interconnects that amplify delays, limiting effective scaling beyond a few dozen nodes. In contrast, supercomputers' low-latency topologies sustain high utilization—often 80-90% for domain-decomposed solvers—by reducing message-passing latencies that would otherwise dominate runtime in distributed-memory paradigms like MPI. Economically, supercomputers' custom optimizations yield superior efficiency for sustained HPC, with purpose-built hardware achieving 2-5 times higher in parallel compute phases compared to general-purpose servers tuned for mixed I/O and latency-sensitive loads. Upfront costs are elevated—typically 2-10 times those of equivalent-scale commodity setups due to specialized components—but (TCO) over 3-5 years can be 20-50% lower for dedicated scientific simulations versus cloud-based alternatives, factoring in energy savings and avoided provisioning overheads from underutilized general resources. This trade-off favors bespoke systems where workloads exhibit predictable, high-intensity ism, though it diminishes for bursty or heterogeneous enterprise computing better served by scalable, pay-per-use data centers.

Historical Development

Early Foundations (Pre-1990)

The origins of supercomputing trace back to the 1940s with the development of large-scale electronic computers for military applications. The ENIAC (Electronic Numerical Integrator and Computer), completed in 1945 at the University of Pennsylvania, served as a proto-supercomputer primarily for artillery ballistics computations during World War II, marking the shift from mechanical to electronic digital computing at scale. It utilized over 17,000 vacuum tubes and achieved a peak performance of approximately 500 floating-point operations per second (FLOPS), enabling rapid trajectory calculations that manual methods could not match. This machine's design emphasized programmability and speed, laying groundwork for handling complex scientific simulations driven by defense imperatives. In the , advancements in technology enabled the first machines explicitly recognized as supercomputers. The , designed by and released in 1964 by , is widely acknowledged as the inaugural supercomputer, outperforming contemporaries by a factor of three with a peak performance of 3 megaFLOPS. Featuring a 100-nanosecond and multiple peripheral processors to offload tasks from the central unit, it addressed early limitations in instruction throughput through innovative architecture that prioritized computational density over general-purpose versatility. Cold War-era demands for nuclear weapons modeling and simulations at institutions like propelled such developments, necessitating custom discrete logic to achieve reliable high-speed operation. The 1970s brought further refinements in single-processor designs, culminating in vector processing to mitigate the von Neumann bottleneck—where sequential memory access limits computational speed—via pipelined operations that processed arrays of data in parallel streams. Seymour Cray's , introduced in 1976 by Cray Research, exemplified this approach with its C-shaped architecture minimizing wire lengths for reduced latency and a peak performance of 160 MFLOPS, a fifty-fold improvement over the CDC 6600. It employed scalar and vector units with deep pipelines, allowing sustained high throughput on scientific workloads like and weather prediction, while innovative cooling via immersion tubes prevented thermal throttling in densely packed circuitry. These systems' evolution from kiloFLOPS to megaFLOPS scales was causally tied to escalating computational needs in defense and energy research, fostering custom silicon innovations despite fabrication challenges of the era.

Parallel Processing Era (1990s-2010s)

The 1990s marked a pivotal shift in supercomputer architecture from vector processors to massively parallel processing (MPP) systems employing distributed memory architectures, driven by the diminishing returns of vector designs amid advancing clock speeds enabled by Moore's Law and the rising viability of commodity off-the-shelf (COTS) components. This transition addressed scalability bottlenecks in shared-memory vector machines, which struggled with synchronization overheads at larger scales, favoring instead message-passing paradigms like MPI for explicit parallelism across thousands of nodes. The U.S. Department of Energy's Accelerated Strategic Computing Initiative (ASCI), launched in 1992 to simulate nuclear weapons without testing, exemplified this era's focus; its Intel-based ASCI Red, deployed in 1997 at Sandia National Laboratories, became the first supercomputer to sustain 1.068 teraflops on the LINPACK benchmark, utilizing 9,072 Pentium Pro processors interconnected via a fat-tree topology. Economic factors accelerated adoption of MPP through plummeting prices of (DRAM) and network interface cards (NICs), reducing the cost per gigaflop and enabling clusters built from standard PC hardware, as seen in early projects. By the mid-2000s, this commoditization propelled , with systems scaling to tens of thousands of nodes via Ethernet or fabrics, though imposed fundamental limits by highlighting that even small serial fractions—often 5-10% in scientific codes—constrained overall speedup, necessitating algorithmic redesigns for near-perfect parallelism. IBM's Blue Gene/L, installed at in 2004, advanced power-efficient design, achieving a peak of 280 teraflops across 65,536 low-power PowerPC 440 nodes at 700 MHz, with a system power draw under 1 MW—far below contemporaries—prioritizing density and reliability for nuclear simulations through a three-dimensional torus interconnect and simplified OS. Entering the 2010s, China's at the National Supercomputing Center in claimed the top spot in November 2010 with 2.507 petaflops sustained performance, integrating 7,168 Fermi GPUs for acceleration alongside CPUs in a hybrid cluster, signaling China's investment in domestic HPC capabilities amid U.S. export restrictions. Large-scale MPP systems faced persistent reliability challenges, with (MTBF) dropping below 40 hours for petascale machines due to aggregated component error rates, necessitating checkpoint-restart mechanisms and error-correcting codes; studies of systems like Blue Gene/L reported over 1,000 hardware faults annually, often from network or power subsystems, underscoring trade-offs in where node count growth amplified failure probabilities despite . These architectural evolutions traded vector simplicity for MPP's raw throughput, fostering applications in climate modeling and but demanding sophisticated software stacks to mitigate inherent bottlenecks.

Exascale and AI Integration (2020s Onward)

The Frontier supercomputer, deployed at Oak Ridge National Laboratory in 2022, became the world's first to surpass the exascale threshold, achieving 1.1 exaFLOPS of sustained performance on the High-Performance Linpack benchmark. By November 2024, optimizations elevated its Rmax to 1.35 exaFLOPS, maintaining its position among the top systems despite subsequent entrants. This milestone marked the transition from petascale to exascale computing, enabled by heterogeneous architectures integrating AMD EPYC CPUs with Instinct MI250X accelerators, though constrained by power limits exceeding 20 megawatts. Subsequent systems expanded the exascale landscape. , at , joined as one of the earliest exascale platforms, leveraging Max CPUs and GPU Max accelerators for over one quintillion calculations per second. , operational at , claimed the top ranking in June 2025 with superior performance driven by processors and MI300A accelerators, alongside and forming the core of U.S. exascale capacity. Europe's , launched at in September 2025, achieved exascale status as the continent's first such system, ranking fourth globally and emphasizing modular designs with accelerators for and workloads, powered entirely by renewables. The 2020s have seen a pronounced pivot toward AI integration, with GPU accelerators dominating supercomputer architectures. NVIDIA's H100 GPUs feature prominently in TOP500 entries, powering systems like Eos and ASPIRE 2A+ for hybrid HPC-AI tasks, reflecting a shift from CPU-centric designs to heterogeneous setups where accelerators contribute over 95% of peak performance. This evolution addresses the verifiable slowdown in aggregate FLOPS growth post-2020, as power walls—evident in stagnant TOP500 performance curves despite hardware advances—limit raw scaling, prompting specialization in energy-efficient chips for targeted workloads like machine learning training. Private sector builds exemplify this AI focus, circumventing traditional HPC paradigms. xAI's Colossus cluster, assembled in 2024 in , initially comprised 100,000 GPUs for model training, expanding to 200,000 by early 2025 with H200 additions, prioritizing rapid AI inference over general-purpose benchmarks. Such systems underscore trends in accelerator heterogeneity, where custom interconnects like Spectrum-X enable massive parallelism, though they highlight tensions between metrics optimized for dense linear algebra and AI's sparse, data-intensive demands.

System Architectures

Processing and Acceleration Technologies

Supercomputer processing relies on high-core-count CPUs and accelerators designed for parallel workloads, where throughput stems from exploiting data-level parallelism through vectorized operations and specialized hardware units. Central processing units (CPUs) handle and scalar computations, while accelerators like graphics processing units (GPUs) and application-specific integrated circuits (ASICs) boost floating-point operations per second (FLOPS) in dense matrix and vector tasks by distributing computations across thousands of simpler cores. This heterogeneous approach causally increases effective compute density but introduces data movement costs between host CPUs and devices, impacting in bandwidth-limited scenarios. Custom RISC processors marked early exascale efforts, as seen in Japan's Fugaku supercomputer, powered by Fujitsu's A64FX ARM-based chips fabricated on a 7 nm process with 48 cores per socket, integrated high-bandwidth memory (HBM2), and Scalable Vector Extension (SVE) supporting up to 512-bit vectors for enhanced SIMD parallelism. Each A64FX delivers 3.379 TFLOPS peak double-precision performance, enabling Fugaku's 442 PFLOPS sustained without dedicated accelerators by prioritizing balanced, wide-vector CPU design. In contrast, the U.S. Frontier system employs AMD's optimized 3rd-generation EPYC CPUs (64 cores at 2 GHz) alongside four Instinct MI250X GPUs per node, totaling 37,888 GPUs across 9,408 nodes for heterogeneous acceleration, where GPUs handle the bulk of parallel FLOPS via matrix cores optimized for AI-like tensor operations. SIMD vector units in both CPUs and GPUs apply identical operations to multiple data elements simultaneously, amplifying throughput in regular, data-parallel kernels like simulations, while tensor cores—specialized matrix multiply-accumulate in GPUs—accelerate low-precision operations critical for training, offering 10-100x speedups over scalar units at the cost of reduced numerical precision. Power-performance trade-offs constrain designs, with (TDP) limits—such as 560 W per MI250X GPU or 300 W for sockets—forcing choices between clock speed, core count, and efficiency; exceeding TDP risks thermal throttling, while underutilization in sparse or communication-heavy workloads yields diminishing returns due to PCIe or transfer overheads. As of June 2025, 237 of the supercomputers incorporate , reflecting a shift toward GPU dominance in high-end systems for workloads benefiting from massive parallelism, though CPU-only clusters persist for legacy or irregular tasks where accelerator orchestration overheads— including complexity and synchronization—can offset gains. , tailored for specific algorithms like tensor contractions, appear in niche HPC-AI hybrids but lag in versatility compared to programmable GPUs, with adoption limited by development costs and inflexibility to evolving benchmarks.

Interconnection and Scalability Designs

Interconnection networks in supercomputers are designed to minimize and maximize for data movement between compute nodes, addressing a primary in parallel performance. High-performance fabrics such as HPE Cray's Slingshot-11, deployed in exascale systems like , provide Ethernet-based connectivity with adaptive routing to handle irregular traffic patterns and achieve low tail under heavy loads. Similarly, networks, including HDR variants offering 200 Gbps per link, are used in systems like certain DOE facilities for their (RDMA) capabilities, enabling efficient collective operations in MPI-based applications. Topologies like the fat-tree are prevalent for their non-blocking properties, where multiple levels of switches ensure high —defined as the aggregate capacity across the dividing the network into equal halves—scaling proportionally with system size to support all-to-all communication patterns without oversubscription. In a k-ary fat-tree, can reach O(k^2) under optimal routing, mitigating contention in large-scale collectives, though real implementations often balance cost with partial oversubscription at higher levels. Scalability in supercomputers follows principles like , which posits that S for scaled problem size N_p with P processors is S = P + (1 - s)(P - 1), where s is the serial fraction; this supports weak scaling where problem size grows with resources, theoretically allowing near-linear for workloads. However, empirical limits emerge from communication overheads, with parallel often dropping to 60-70% at 100,000+ nodes due to increased in global and fault , as movement across fabrics consumes up to 30% of cycle time in memory-bound applications. Emerging optical interconnects address power bottlenecks in data movement, potentially reducing energy per bit by 10x over at distances beyond 100 meters through photonic switching, as demonstrated in prototypes for exascale systems where electrical links contribute 20-30% of total power draw. At extreme scales, (MTBF) declines to approximately 1 day or less per in petascale clusters, scaling inversely with system size due to cumulative hardware fragility, necessitating (RAS) features like silent error detection, checkpointing, and dynamic node sparing to sustain job completion rates above 90%.

Specialized Versus General-Purpose Systems

Specialized supercomputers employ custom hardware architectures, such as application-specific integrated circuits (), optimized for particular computational patterns, yielding substantial performance gains and energy efficiencies compared to general-purpose systems. For instance, the Anton series, developed by D.E. Shaw Research, features tailored for simulations, enabling roughly 100 times faster execution than equivalent general-purpose supercomputers for protein-water systems involving tens of thousands of atoms. This optimization stems from hardware-level approximations of force calculations and neighbor searches, which minimize unnecessary generality and reduce computational overhead inherent in versatile processors. In contrast, general-purpose supercomputers rely on clusters of commodity central processing units (CPUs) and graphics processing units (GPUs), such as those in systems like or , which prioritize reprogrammability across diverse workloads including (HPC) and (AI) tasks. These designs facilitate software-driven adaptations without hardware redesigns, but they incur inefficiencies due to the overhead of handling varied instruction sets and data flows not aligned with any single application. Empirical benchmarks reveal that specialized accelerators like Google's Tensor Processing Units (TPUs) outperform CPU/GPU clusters by 15 to 30 times in neural network inference, attributed to fixed-function units that avoid the branching and caching penalties of general-purpose cores. The core trade-offs arise from causal constraints in hardware design: specialized systems achieve energy savings—evidenced by supercomputers incorporating custom processors improving calculations per watt nearly five times faster over time—by eliminating superfluous capabilities, but they face obsolescence risks if algorithmic paradigms evolve beyond the fixed envelope. General-purpose architectures mitigate this through flexibility, allowing sustained utility via and software updates, yet they exhibit lower peak efficiencies for targeted domains, as general-purpose processors must balance competing demands like integer operations and floating-point precision across unpredictable . In practice, this manifests in higher operational costs for general systems when emulating specialized behaviors, underscoring the necessity of aligning hardware specificity with workload predictability to maximize throughput per unit energy.

Performance Assessment

Key Metrics and Benchmarks

The primary metric for assessing supercomputer performance remains (FLOPS), quantified as Rpeak—the theoretical maximum derived from hardware specifications such as clock frequency, core count, and capabilities—and Rmax, the achievable performance measured via the High Performance LINPACK (HPL) benchmark, which solves dense systems of linear equations. HPL emphasizes sustained arithmetic throughput on regular, compute-bound kernels, often achieving 50-80% of Rpeak on leading systems, but its focus on dense matrices favors architectures optimized for such patterns over broader workload realism. To address HPL's limitations in capturing memory-bound operations prevalent in scientific simulations, the High Performance Conjugate Gradient (HPCG) benchmark was introduced as a complement, stressing sparse matrix-vector multiplications, irregular memory access, and higher demands (typically in TB/s). HPCG yields substantially lower scores—often 5-10% of HPL equivalents—highlighting architectural imbalances where peak overstate efficacy for codes with unstructured grids or iterative solvers, as these expose bottlenecks in data movement rather than pure computation. For AI-driven workloads, MLPerf benchmarks evaluate training and inference throughput on representative models like deep neural networks, incorporating end-to-end metrics such as time-to-train to fixed accuracy or samples-per-second, which better reflect tensor operations, data loading, and scalability in heterogeneous GPU/ environments. Supercomputer evaluations distinguish capability computing, which maximizes single-job peak performance for grand-challenge problems requiring massive parallelism, from capacity computing, which prioritizes aggregate throughput for numerous smaller, concurrent tasks; most systems target capability, yet real-world utilization often blends both, with HPL-derived metrics underemphasizing capacity factors like job queuing and I/O contention. Critically, these benchmarks inadequately represent full-system realities: HPL and HPCG prioritize and but neglect sustained I/O rates (e.g., PB/s for large datasets) and , where drops to minutes at exascale, rendering arithmetic peaks irrelevant without resilient checkpointing and mechanisms. Empirical analyses show HPL can mislead by enabling "stunt" optimizations that excel in dense benchmarks but falter on irregular, production codes with sparse data dependencies. Thus, holistic assessment demands integrating (e.g., STREAM benchmarks for memory) and resilience proxies, as pure metrics risk prioritizing theoretical ceilings over causal determinants of workload solvability.

TOP500 Rankings and Their Evolution

The project ranks the 500 most powerful non-distributed supercomputers worldwide based on their measured performance in the High-Performance LINPACK (HPL) benchmark, which solves dense systems of linear equations to report sustained double-precision floating-point operations per second (Rmax). Launched in June 1993 at the International Supercomputing Conference in , , the list has been updated biannually in June and , relying on voluntary submissions from system owners who run the portable HPL implementation on their hardware. This methodology provides a standardized, comparable metric for peak computational capability, though submissions require verifiable evidence of runs. In the June 2025 edition, the 65th list, El Capitan at Lawrence Livermore National Laboratory (LLNL) in the United States retained the number-one position with 1.742 exaFLOPS Rmax, utilizing HPE Cray EX255a architecture with AMD EPYC CPUs and Instinct MI300A accelerators interconnected via Slingshot-11. The top three systems—El Capitan, Frontier (0.998 exaFLOPS), and Aurora (0.585 exaFLOPS)—are all U.S. Department of Energy (DOE) installations, representing three of the ten exascale-class machines (≥1 exaFLOPS) on the list and underscoring American leadership in sustained high-performance computing deployment. Evolutionary trends reveal a pronounced shift toward accelerator-augmented designs, with GPUs or specialized processors comprising over 95% of the top systems' compute capacity by , as vendors optimize for HPL's memory-bound, bandwidth-intensive kernel that benefits from high-throughput vector units. Processor family analyses across lists show dominance by , , and accelerators, correlating with exponential Rmax growth that has outpaced equivalents, from teraFLOPS-scale in to exaFLOPS today. Concurrently, China's representation has declined sharply post-2019 U.S. export controls on advanced semiconductors, with submissions ceasing around 2022; the country previously held over 200 entries but now accounts for fewer than 100, attributed to operators withholding data amid hardware access restrictions and geopolitical scrutiny rather than outright capability loss. Critiques of the TOP500 center on HPL's narrow focus on dense linear algebra, which privileges systems engineered for artificial peak performance—often at the expense of balance for sparse matrices, iterative solvers, or irregular data access patterns common in scientific simulations—potentially misrepresenting utility for non-LINPACK workloads like modeling or . This benchmark bias encourages over-investment in FLOPS-maximizing hardware, underemphasizing metrics such as energy efficiency (addressed separately by ) or graph500 for traversal, prompting proposals for complementary standards like HPCG to better capture memory subsystem efficacy.

Critiques of Measurement Standards

The High-Performance Linpack (HPL) benchmark, which underpins rankings by measuring sustained dense linear algebra performance, has faced scrutiny for its narrow focus on compute-bound, regular workloads that fail to capture the diverse demands of most supercomputer applications. HPL's emphasis on O(n³) floating-point operations with O(n²) data movements prioritizes peak flops over memory-bound or irregular patterns, rendering it unrepresentative of simulations involving sparse matrices, traversals, or iterative solvers common in fields like and bioinformatics. This mismatch arises because real-world codes often exhibit poor data locality and limitations, where HPL's artificial regularity allows optimizations irrelevant to production runs. Proposed alternatives address these gaps by targeting irregular and data-intensive kernels; for instance, the Graph500 benchmark evaluates on scale-free graphs, stressing random memory accesses and communication overheads akin to those in or knowledge graphs, which HPL ignores. Similarly, HPCG (High-Performance Conjugate Gradient) incorporates sparse matrix-vector multiplications, reflecting the bandwidth sensitivity of solvers in partial differential equations, and has shown orders-of-magnitude lower efficiencies on systems compared to HPL, highlighting architectural mismatches. These benchmarks reveal that HPL efficiencies often exceed 50% of Rpeak, while Graph500 or HPCG drop below 1%, underscoring HPL's detachment from causal factors like interconnect latency in scaled systems. Benchmark gaming exacerbates these issues, as vendors tune hardware and software stacks—such as overprovisioning accelerators for HPL's dense kernels—to maximize Rpeak submissions, even when those components remain idle in operational workloads. This practice inflates theoretical peaks without proportional gains in sustained performance, as evidenced by cases where GPU-heavy systems achieve high scores but deliver negligible throughput for non-Linpack tasks due to unoptimized drivers or data staging. Such optimizations can yield 20-50% divergences between benchmarked and audited real-world efficiencies, driven by parameter tuning that exploits HPL's sensitivity to block sizes and pivoting strategies rather than general-purpose . Advocates for holistic evaluation argue that compute-centric metrics like HPL overlook systemic factors determining scientific value, including job queue throughput, , and allocation efficiency, which better predict research output than raw . Empirical analyses indicate weak correlations between positions and metrics like publications or citations per petaflop, as productivity hinges on and user training rather than isolated kernel speed. Integrating these—via suites like or application-specific proxies—would expose trade-offs, such as favoring vector units over tensor cores mismatched to legacy codes, fostering architectures aligned with causal workload realities over benchmark artifacts.

Energy and Thermal Management

Power Consumption Patterns

Supercomputer power consumption has escalated dramatically with performance scaling, from the Cray-1's 115 kW draw in 1976 to the system's approximately 21 MW in 2022. This progression reflects the physics of increased transistor density and clock speeds, where total energy dissipation rises despite per-device efficiency gains under Dennard scaling's breakdown. By 2025, leading systems typically consume 20-30 MW at peak, while the median across ranked machines approaches 3 MW, driven by the aggregation of millions of cores and accelerators in dense configurations. The primary causal mechanism is in transistors and interconnects, where resistive losses from electron flow—governed by P = I^2 R—dominate dynamic power as switching activity intensifies. Transistor-level dissipation arises from capacitive charging (CV^2 f) and leakage currents, exacerbated at nanoscale nodes where voltage scaling limits yield . Interconnects contribute substantially, often 20-30% of total power in large-scale systems, due to capacitive loading and signal propagation delays requiring high-bandwidth, low-latency fabrics like or . The Landauer limit, a theoretical minimum of kT \ln 2 per bit erasure, remains practically irrelevant, as operational energies exceed it by orders of magnitude owing to irreversible heat generation and non-ideal dissipation. Exascale designs highlight the tension between performance targets and power budgets: the U.S. Department of Energy and aimed for under 20 MW to achieve 1 EFLOPS, yet delivers 1.1 EFLOPS sustained at around 21 MW, marginally exceeding the envelope through GPU efficiencies but underscoring scaling's thermodynamic constraints. Empirical data from HPL benchmarks show systems operating at 60-70% of peak power, implying real workloads may draw less but still aggregate to MW-scale totals for top-tier machines.

Cooling Innovations and Challenges

Early supercomputers relied on , as seen in systems like the , which used forced-air to manage from vacuum tubes and early transistors, but this approach proved inadequate for scaling beyond kilowatt-scale racks due to limited coefficients. Transitioning to liquid cooling methods addressed these limitations; direct-to-chip (DTC) cooling, where coolant flows through microchannels attached to processors, became prevalent in for its ability to handle heat fluxes up to several hundred watts per chip by minimizing thermal resistance at the source. Immersion cooling submerges entire server boards in non-conductive fluids, either single-phase (liquid remains liquid) or two-phase (fluid boils to vapor for enhanced absorption), enabling dissipation of densities exceeding 1 kW/cm² as demonstrated in experimental intra-chip two-phase systems targeting benchmarks for future microprocessors. Two-phase variants leverage phase change for superior efficiency in ultra-high power scenarios, though they require specialized fluids like fluorinated refrigerants with boiling points around 50°C to prevent hotspots. Cooling systems in supercomputers consume approximately 40% of total facility power, contributing to (PUE) values often exceeding 1.2 in dense deployments despite theoretical ideals closer to 1.1, as overhead for pumps, heat exchangers, and redundancy drives inefficiencies. Leak risks pose operational challenges, with incidents of fluid breaches damaging multimillion-dollar GPU arrays in liquid-cooled environments, underscoring vulnerabilities in and under continuous high-pressure operation. Innovations like Microsoft's Project Natick explored submerged pods leveraging ocean water for natural convection, yielding empirical reductions in hardware failures and energy for cooling through ambient submersion, though scalability remains constrained at facility scales approaching 100 MW where thermal management compounds with power distribution limits. Such approaches highlight trade-offs in feasibility, as exascale systems push boundaries where air augmentation fails and liquid infrastructure demands precise fluid compatibility to avoid or breakdown.

Empirical Evaluations of Sustainability Claims

Empirical assessments indicate that (HPC) systems, including supercomputers, consume a modest share of electricity relative to their scientific and economic contributions. Data centers as a whole accounted for approximately 1-2% of electricity use in recent years, with HPC representing a small subset thereof, estimated at under 0.5% of total electricity demand when excluding broader and workloads. This contrasts with sectors like , which emit comparable or higher greenhouse gases—around 2.5% of global CO2—yet HPC delivers disproportionate returns through accelerated R&D, such as modeling complex physical processes unattainable via slower alternatives. Claims of outsized environmental harm often overlook these asymmetries, where HPC's enables breakthroughs that reduce long-term resource demands across industries. Sustainability critiques frequently exaggerate HPC's carbon footprint by isolating operational emissions without accounting for efficiency offsets or downstream benefits. Historical trends show computations per joule in HPC improving at rates exceeding , roughly doubling every 18 months, which has outpaced raw power growth and mitigated per-flop emissions over time. For instance, supercomputer simulations have advanced fusion energy research by enabling detailed plasma modeling on facilities like DIII-D, potentially yielding carbon-free power sources that dwarf HPC's inputs. Similarly, in drug discovery, HPC-driven have accelerated candidate screening by factors of 10, shortening development timelines and enabling therapies that enhance human health efficiencies. These applications justify energy use under causal analysis, as alternatives like empirical trial-and-error would consume more cumulative resources without comparable precision. Integration of renewables further tempers sustainability concerns for modern systems. The exascale supercomputer, operational since September 2025, operates entirely on sources, incorporating advanced cooling and reuse to achieve 60 gigaflops per watt—among the highest efficiencies globally. Private initiatives, such as xAI's Colossus cluster, demonstrate agility in deploying liquid cooling for enhanced efficiency, avoiding the inefficiencies of heavily subsidized public grids often critiqued for bias toward intermittent renewables over dispatchable power. Overstated alarms, prevalent in and sources prone to environmental , ignore such offsets; for example, HPC's role in optimizing energy systems via yields net reductions in sectoral emissions, prioritizing verifiable outputs over unquantified externalities.

Software Infrastructure

Operating Systems and Kernel Adaptations

Nearly all supercomputers listed on the rankings as of June 2025 operate using Linux-based operating systems, with the Linux family accounting for over 99% of systems. Common distributions include SUSE Linux Enterprise Server for systems' service nodes, (RHEL) variants customized for (HPC), and specialized environments like Tri-Lab Operating System Software (TOSS) deployed on U.S. Department of Energy machines such as . These choices prioritize stability, scalability, and minimal overhead over consumer-oriented features, enabling efficient management of thousands of nodes and millions of cores. Kernel modifications focus on optimizing for (NUMA) architectures prevalent in large-scale clusters, where varies significantly across nodes. Adaptations include enhanced NUMA balancing to localize memory allocations and reduce remote access penalties, as well as support for huge pages—typically 2MB or 1GB in size—to decrease (TLB) misses and overhead in memory-intensive workloads. Transparent huge page (THP) support in the automates this for eligible processes, improving performance in NUMA systems by consolidating small pages into larger contiguous blocks without manual intervention. Workload management integrates tightly with the OS kernel via tools like SLURM (Simple Linux Utility for Resource Management), which handles job scheduling, resource allocation, and fault tolerance across clusters. SLURM powers approximately 60% of supercomputers, leveraging kernel features for efficient process migration and priority queuing to minimize contention in environments with hundreds of thousands of cores. Its design emphasizes low-latency signaling and integration to enforce isolation, supporting scalability to over 10,000 nodes. At extreme scales exceeding 100,000 cores, kernel-induced challenges arise, including elevated context switch overhead from scheduler interruptions and OS jitter that disrupts tightly synchronized parallel computations. These issues stem from shared kernel structures like runqueues and locks, which amplify contention in many-core domains, potentially degrading application performance by introducing variability in execution times. Mitigations involve lightweight kernel variants or disabling non-essential interrupts to prioritize application uptime, targeting availability levels approaching four nines (99.99%) through redundant scheduling and rapid failure recovery. Containerization adaptations, such as (now Apptainer), address by encapsulating user-space environments without requiring privileges, crucial for multi-tenant HPC systems. These containers bind to the host while isolating dependencies, enabling consistent deployments across heterogeneous hardware and reducing setup variability in scientific workflows. Performance overhead remains low, often under 15% for compute-bound tasks, preserving native kernel access for MPI communications. Historically, proprietary systems like 's UNICOS—a UNIX derivative introduced in 1985 for vector processors—evolved to support but transitioned to -based Cray Linux Environment (CLE) by the early 2010s for broader compatibility and community-driven optimizations. This shift facilitated integration with standard HPC tools while retaining reliability features like fault-tolerant booting, reflecting a broader move toward kernels tuned for exascale reliability over OS development.

Parallel Programming Models

Parallel programming models in supercomputing address the need for explicit synchronization and data locality in distributed-memory environments, where processes operate independently but must coordinate to avoid race conditions and ensure causal consistency. The (MPI), first standardized in June 1994 by the MPI Forum, dominates for its portability across heterogeneous clusters, using explicit send-receive semantics and collectives to implement the (SPMD) execution model, which facilitates load-balanced distribution over thousands of nodes. , specified initially for in October 1997, augments this with directive-based shared-memory parallelism, enabling hybrid MPI-OpenMP strategies that exploit node-level multi-core coherence while deferring inter-node communication. Partitioned Global Address Space (PGAS) paradigms, exemplified by Unified Parallel C (UPC)—whose specification evolved from Berkeley Lab prototypes in the late and reached version 1.2 by 2005—provide a virtually shared with private partitions, supporting one-sided put/get operations that bypass explicit handshakes, thus reducing in remote memory access compared to MPI's two-sided model. For GPU-accelerated nodes, Open Accelerators (OpenACC) directives, introduced via industry collaboration in November 2011 with initial specifications in 2012, annotate host code for automatic data transfer and kernel launch, abstracting low-level accelerator programming while preserving host-directed . These models trade explicit control for scalability: SPMD via MPI excels in homogeneous, communication-intensive workloads but incurs overhead from collective barriers, often yielding strong scaling limited by —where speedup approaches 1 over the serial fraction—necessitating code refactoring for fractions below 5% to exceed 10x gains on petascale systems. Hybrid variants mitigate distributed-memory bottlenecks within nodes but amplify tuning complexity, as mismatched thread counts can degrade efficiency by introducing or underutilization; counters this by advocating problem-size , enabling weak scaling efficiencies above 90% for data-parallel tasks where communication scales sublinearly with processors. Recent evolutions prioritize abstraction from hardware details, as in the system from Stanford, whose core model debuted in a 2012 paper, employing logical regions and task launches to automate partitioning and coherence without programmer-specified mappings, thus supporting dynamic heterogeneity in exascale prototypes. For AI-driven supercomputing, Distributed—building on MPI-like backends since its 2017 inception—adapts SPMD to tensor sharding and all-reduce operations, facilitating model parallelism across nodes while handling irregular data dependencies through asynchronous .

Essential Tools and Optimization Frameworks

Debugging parallel applications on supercomputers requires specialized tools capable of handling thousands of processes and threads across distributed nodes. TotalView, developed by , supports source-level debugging for serial and parallel programs in languages including C, C++, , and , enabling features like thread control and detection on HPC systems such as those at . Similarly, Arm DDT (formerly Allinea DDT) facilitates multi-process and multi-thread debugging for up to 2048 processors, supporting , , OpenACC, and GPU code, with deployment on facilities like NERSC for scalable fault isolation and core file analysis. These debuggers enhance developer productivity by reducing debugging time from days to hours in complex simulations, as evidenced by their adoption in production HPC environments. Performance profiling identifies computational bottlenecks in supercomputer workloads, where tools like and Vampir provide instrumented tracing and visualization. , from the , offers portable profiling for parallel programs in Fortran, , , UPC, Java, and Python, capturing metrics such as , I/O, and hardware counters, with export capabilities to Vampir for timeline analysis. Vampir complements this by visualizing trace data to reveal message-passing patterns and load imbalances in MPI applications, aiding in optimizations that can yield 2-5x speedups by targeting communication overheads, as reported in empirical studies on leadership-class systems. Autotuners such as ATLAS empirically tune BLAS routines for specific hardware, achieving up to 1.5x performance gains over vendor libraries in linear algebra kernels on ARM-based clusters, by searching parameter spaces for cache-optimal block sizes and . GPU acceleration frameworks like NVIDIA's and AMD's enable on supercomputers, with HIP providing CUDA-like syntax for portability across vendors. Porting atmospheric models to HIP has demonstrated significant speedups, such as 10x or more in schemes on GPU clusters, by leveraging vectorized operations and coalescing. Emerging trends include machine learning-guided autotuning, as in MLKAPS, which uses decision trees and adaptive sampling to optimize HPC kernels, reducing tuning overhead while matching exhaustive search performance. Integration with containers like Apptainer (formerly ) further supports portability, encapsulating optimized binaries and dependencies for reproducible deployment across supercomputer architectures without root privileges.

Core Applications

Scientific and Engineering Simulations

Supercomputers facilitate high-resolution simulations of physical phenomena by numerically solving systems of partial equations (PDEs) that model laws such as Navier-Stokes for fluids or Einstein's field equations for , often requiring sustained performance exceeding 10^18 floating-point operations per second () to achieve feasible resolutions. These computations address inverse problems, where parameters like material properties or initial conditions are inferred from observational data, demanding iterative optimizations that scale with grid points—typically necessitating petaFLOPS or exaFLOPS for problems involving billions of . Such capabilities arise from parallel architectures distributing workloads across thousands of nodes, enabling grounded in first-principles physics rather than empirical correlations alone. In climate modeling, supercomputers like Frontier at Oak Ridge National Laboratory support codes such as the Simple Cloud Resolving E3SM Atmosphere Model (SCREAM), which performed 40-year global simulations at 3-km resolution using 32,768 GPUs, resolving cloud processes previously parameterized and reducing precipitation biases observed in coarser models. This earned the 2023 Gordon Bell Prize for climate modeling, demonstrating how exascale compute accelerates multi-decadal forecasts by integrating atmosphere, ocean, and land interactions at scales capturing convective dynamics. The U.S. Department of Energy (DOE) allocates millions of node-hours annually through programs like the ASCR Leadership Computing Challenge (ALCC), with 38 million awarded in 2025 to projects including such simulations, prioritizing verifiable advancements in predictive accuracy over unsubstantiated claims of precision. Astrophysics benefits from adaptive mesh refinement (AMR) codes like GRChombo, which simulates relativistic phenomena such as mergers on supercomputers including DiRAC and SuperMUC-NG, extracting signals matching detections through full 3+1 evolution. These runs leverage block-structured AMR to focus resolution on horizons and waves, requiring supercomputing to handle nonlinear PDE and stability over dynamical timescales, with applications to probing inflation-era perturbations. NSF and DOE facilities provide core-hour grants, as seen in sustained allocations for consortia, enabling in strong-field regimes inaccessible to analytic methods. Materials science employs density functional theory (DFT) for quantum mechanical simulations of electronic structure, where computational cost scales as O(N^3) to O(N^4) with system size N, compelling supercomputer use for defects in solids or surfaces exceeding hundreds of atoms. DOE-supported efforts, such as those at , apply DFT to energy materials like battery cathodes, predicting properties via Kohn-Sham equations solved on parallel clusters to inform synthesis and reduce trial-and-error experimentation. Earthquake engineering exemplifies verifiable gains, with exascale simulations on DOE systems modeling fault dynamics over 700,000 simulated years, revealing ground motion amplifications tied to and enhancing structural designs against magnitudes up to 8.0. Such DOE/NSF allocations, totaling billions of core-hours over decades for seismic consortia like SCEC, yield causal insights into rupture , though persistent uncertainties in fault and heterogeneity limit deterministic . While these simulations accelerate discovery—e.g., refining parameterizations or validating —intrinsic limitations persist, including numerical approximations in closures and sensitivity to initial conditions in chaotic systems, underscoring that computational scale amplifies resolution but does not eliminate epistemic gaps in sub-scale physics. Peer-reviewed allocations emphasize empirical validation against observations, mitigating biases in model tuning prevalent in less rigorous academic outputs.

Military and Intelligence Operations

Supercomputers play a pivotal role in nuclear stockpile stewardship, enabling simulations of weapon performance and aging without physical testing, as mandated by the U.S. Comprehensive Test Ban Treaty framework. The Accelerated Strategic Computing Initiative (ASCI), launched by the U.S. Department of Energy's Defense Programs in 1995, developed massively parallel supercomputing capabilities to model nuclear weapons designs and effects, supporting verifiable deterrence amid proliferation risks. Its successor, the PathForward program initiated around 2017, advanced co-design efforts for exascale systems to enhance predictive accuracy for the nuclear lifecycle. The El Capitan supercomputer, deployed at Lawrence Livermore National Laboratory and benchmarked at 1.742 exaFLOPs in December 2024, exemplifies this, providing the National Nuclear Security Administration (NNSA) with unprecedented modeling for stockpile safety, security, and reliability. In intelligence operations, supercomputers facilitate (SIGINT) processing and cyber simulations by handling vast datasets for real-time analysis and threat modeling, though much remains classified. Advanced computing underpins decryption, in encrypted communications, and defensive cyber exercises, contributing to advantages in contested domains. The Department of Defense's Modernization Program (HPCMP) allocates resources for such tasks, enabling scalable simulations that reduce empirical testing needs and inform operational decisions. Military applications extend to hypersonics modeling, where supercomputers simulate aerothermodynamics, , and material responses at + speeds, accelerating development cycles. The Air Force Research Laboratory's Raider supercomputer, introduced in 2023, processes years of data in days for validation, supporting programs like the Hypersonic . These capabilities yield strategic edges, as evidenced by HPCMP contributions to offensive hypersonic fielding, with manifested in cost savings and deterrence efficacy over proliferation alternatives. Critics highlight opacity in classified applications, yet empirical outcomes, such as sustained U.S. nuclear certification without tests since 1992, affirm their security value.

AI and Machine Learning Workloads

Supercomputers have become essential for and of large-scale models, which demand unprecedented computational intensity due to the quadratic scaling of operations with model size and dataset volume. For instance, required approximately 2 × 10^{25} (), a figure derived from estimates based on counts, , and efficiency metrics. This scale exceeds traditional () simulations, necessitating architectures optimized for matrix multiplications and low-precision arithmetic to handle trillions of . Key distinctions in AI workloads involve parallelism strategies tailored to supercomputer topologies. Data-parallel training distributes identical model copies across nodes, each processing disjoint data batches, with gradients aggregated via all-reduce operations; this suits moderate-sized models but incurs communication overhead on large clusters. Model-parallel approaches partition the model itself—e.g., layers or attention heads—across devices, reducing per-node memory but increasing inter-node bandwidth demands, often combined in hybrids like pipeline or tensor parallelism for models exceeding single-GPU capacity. GPUs dominate due to tensor core efficiency; the NVIDIA H100 delivers up to 3.958 PFLOPS in FP8 precision for sparse operations, enabling 4× faster training over prior generations by exploiting reduced numerical fidelity without significant accuracy loss. Prominent examples include Microsoft's Azure Eagle supercomputer, which achieved record GPT-3 training times in MLPerf benchmarks using 14,400 networked GPUs at 561 PFLOPS peak, supporting fine-tuning of larger successors. Private initiatives like xAI's Colossus cluster, comprising 100,000 H100 GPUs (expanded to 200,000 by late 2024), prioritize AI-exclusive workloads with liquid cooling and high-bandwidth networking, delivering aggregate FP8 performance in the exaFLOPS range for model development. Recent trends reflect a pivot from HPC-dominant systems to AI-specialized clusters, with AI supercomputer performance doubling every nine months amid rising power and cost demands, outpacing public TOP500 lists where private deployments lead in scale. This shift emphasizes GPU density over CPU versatility, driven by inference needs for real-time applications and the convergence of AI training with distributed storage for petabyte-scale datasets.

Commercial and Economic Analyses

In the commercial sector, supercomputers enable profit-oriented applications such as reservoir simulations in energy exploration, where 's Discovery 6 system, deployed in 2025, processes seismic data four times faster than its predecessor to map oil and gas deposits, reducing exploration risks and accelerating field development decisions. Earlier, achieved a record in 2017 by simulating reservoir scenarios on 716,800 processors, generating outputs thousands of times faster than industry norms and enabling rapid evaluation of development options to optimize . Financial institutions leverage supercomputing for simulations to model risk scenarios and price complex derivatives, with (HPC) systems handling millions of probabilistic paths to forecast outcomes under uncertainty, thereby supporting quicker portfolio adjustments and . These applications yield returns through process efficiencies, such as improved predictive accuracy that minimizes capital misallocation, though quantifying precise ROI remains challenging due to proprietary models. In and , firms like employ supercomputers for simulations optimizing designs, achieving up to 1% gains in that translate to competitive cost reductions. benefits from HPC-driven route planning and , enabling firms to cut delays and costs via large-scale scenario testing. Private-sector adoption has surged, with companies controlling 80% of AI-oriented GPU clusters by 2025, up from 40% in 2019, fueled by systems like NVIDIA's DGX platforms that integrate hardware and software for enterprise-scale computations. The global supercomputers market, increasingly private-driven, expanded to USD 7.9 billion in 2024 and is projected to reach USD 18.03 billion by 2033, emphasizing efficiency metrics over raw performance for cost-effective scaling. However, intellectual property protections hinder data sharing across firms, limiting collaborative efficiencies despite shared computational paradigms.

Distributed Computing Extensions

Grid and Volunteer Networks

Grid computing extends supercomputing capabilities by federating distributed resources across institutions, enabling resource sharing for large-scale scientific workloads. The European Grid Infrastructure (EGI), established in 2010, exemplifies this approach, aggregating over 1 million CPU cores from data centers worldwide to support more than 1.6 million batch computing jobs per day as of recent assessments. This infrastructure facilitates for research in fields like high-energy physics and climate modeling, where tasks can be partitioned across heterogeneous sites without requiring centralized ownership. Volunteer computing networks, conversely, leverage idle cycles from public volunteers' devices via middleware like BOINC, launched in 2002 by the . Projects such as , which simulates protein dynamics for biomedical research, demonstrated the paradigm's potential by attaining a peak of 470 petaFLOPS in March 2020, surpassing the then-top supercomputer Summit's 200 petaFLOPS during intensified studies. Similarly, analyzed data for extraterrestrial signals, sustaining around 0.77 petaFLOPS at its height through volunteer contributions. These networks achieve at near-zero cost, as volunteers provide compute without dedicated funding, yielding effective resource utilization for independent subtasks. Despite these advantages, heterogeneity in , operating systems, and conditions across nodes imposes scheduling overheads, reducing overall system coherence compared to homogeneous dedicated clusters. vulnerabilities arise from untrusted volunteer endpoints, including risks of malicious or data tampering, which demand client-side validation and result replication—mechanisms that inflate computational . Empirical comparisons reveal volunteer setups require approximately 2.8 active nodes to equate one instance's reliable output, reflecting from volunteer churn and variable . Energy efficiency lags dedicated supercomputers, with volunteer PCs exhibiting lower per watt due to consumer-grade components and inefficient idle harnessing. Fundamentally, bandwidth latencies and intermittent connectivity preclude viability for tightly coupled simulations requiring frequent inter-node communication, favoring instead applications where tasks execute autonomously. Grid variants like EGI mitigate some issues through institutional trust models but still contend with cross-site policy variances, limiting aggregate efficiency to niches outside latency-critical domains. Thus, while opportunistic for cost-sensitive, throughput-oriented problems, these networks complement rather than supplant dedicated supercomputers for peak performance demands.

Cloud-Based and Hybrid Supercomputing

Cloud-based supercomputing enables organizations to access resources on demand through major providers, avoiding the capital-intensive requirements of dedicated hardware. (AWS) offers tools like ParallelCluster, an open-source cluster management solution for deploying and scaling HPC workloads, and the Parallel Computing Service (PCS), a managed offering tailored for supercomputing applications as of August 2024. provides Azure HPC capabilities, integrating with schedulers like SLURM for parallel processing and supporting GPU-accelerated instances suitable for AI and simulation tasks. and others extend this with custom HPC configurations, allowing users to provision thousands of cores dynamically. These platforms support bursting to high scales via mechanisms like spot instances, which offer preemptible capacity at discounts up to 90% compared to on-demand pricing, enabling cost-effective handling of peak loads without fixed infrastructure. While not yet achieving sustained exascale performance equivalent to dedicated systems like , cloud HPC can aggregate resources for petaflop-scale computations, particularly for bursty workloads in or scientific modeling. Hybrid supercomputing integrates on-premises systems with resources, directing overflow tasks—such as sporadic simulations or data processing surges—to elastic providers, thereby optimizing utilization of existing hardware. This approach leverages pay-per-use pricing for scalability, reducing (TCO) by 30-40% for variable workloads through avoidance of idle capacity. Benefits include enhanced flexibility for fluctuating demands and seamless extension of local clusters via , as seen in integrations between SLURM-managed on-prem setups and AWS or . However, drawbacks encompass data egress fees, which can inflate costs for large transfers (often $0.09 per on AWS), potential in hybrid data flows, and compliance challenges for regulated sectors requiring . In 2025, trends indicate accelerated growth in -focused cloud supercomputing, with providers reporting 15-25% year-over-year increases in AI workloads and organizations prioritizing models for sustained TCO efficiencies amid variable loads like inference spikes. Adoption is driven by verifiable savings in for non-constant compute needs, though security risks from multi-environment data movement necessitate robust and protocols.

Geopolitical and Economic Realities

State-Sponsored Initiatives Worldwide

The United States Department of Energy (DOE) has spearheaded major supercomputer deployments through its national laboratories, including Frontier at Oak Ridge National Laboratory, which achieved 1.102 exaFLOPS of sustained performance in 2022 as the first exascale system worldwide, and El Capitan at Lawrence Livermore National Laboratory, verified in November 2024 as the fastest supercomputer at over 2 exaFLOPS. These systems, developed under DOE's Exascale Computing Project with investments exceeding $600 million per machine in hardware and integration, prioritize simulations for energy, materials science, and nuclear stockpile stewardship, demonstrating high efficiency with U.S. systems comprising about 48% of global TOP500 performance aggregate in mid-2025. In , the EuroHPC Joint Undertaking (JU), established in 2018 with €1 billion initial EU funding matched by member states, coordinates procurement and operation of petascale and exascale machines to foster in . Key systems include in , operational since 2022 with 375 petaFLOPS peak and partial EU/national funding of €200 million, and JUPITER in , Europe's first exascale supercomputer procured in 2023 with €50% EU and 50% German federal financing totaling over €300 million. By October 2025, EuroHPC expanded to 37 participating states, including recent additions like , while allocating additional €55 million for AI-optimized extensions, though critics note potential redundancies in duplicating U.S.-style architectures amid varying flops-per-euro returns lower than U.S. benchmarks. Japan's government, via the Ministry of Education, Culture, Sports, Science and Technology (MEXT), invested ¥110 billion (approximately $750 million) in Fugaku, operational since 2021 at with 442 petaFLOPS sustained performance, topping lists from 2020 to 2022 before yielding to exascale peers. The successor, FugakuNEXT, announced in 2025 with another $750 million commitment, targets zettaFLOPS-scale by 2030 using domestic Arm-based CPUs and GPUs, emphasizing national R&D but facing challenges relative to U.S. systems' higher performance per dollar. Other nations pursue targeted programs, such as Singapore's National Supercomputing Centre expanding with $24.5 million government funding for a new system operational by late to integrate quantum elements, reflecting a broader trend of subsidies totaling billions globally yet yielding uneven empirical gains in compute efficiency, where U.S.-led designs often achieve superior per dollar through scaled and private tech integration.

US-China Rivalry in Compute Capacity

The maintains a significant lead in verified supercomputing capacity over , as evidenced by the list from June 2025, which ranks three U.S. Department of Energy systems—, , and —as the world's only confirmed exascale machines, each exceeding 1 exaFLOPS in high-performance . These systems collectively dominate the top positions, with the U.S. hosting 175 of the 500 fastest supercomputers worldwide, compared to China's 47. U.S. export controls, implemented since October 2022 and expanded through 2024, have restricted China's access to advanced semiconductors and computing hardware, including prohibitions on high-end GPUs and ASML's (EUV) lithography tools essential for cutting-edge chip fabrication. Such measures have curbed upgrades to systems like earlier Tianhe variants reliant on restricted foreign components, preserving the U.S. edge by limiting China's integration of state-of-the-art accelerators. In response, has accelerated development of indigenous processors, such as the Sunway SW26010-Pro CPU, which reportedly quadruples the performance of its predecessor and enables exaFLOPS-scale theoretical throughput in secretive systems not submitted to benchmarks. Domestic alternatives like Phytium and Shenwei chips power machines such as the unverified Tianhe Xingyi, aiming for amid sanctions, though these lag in efficiency and ecosystem maturity compared to U.S.-accessible or architectures. Despite progress in model benchmarks, trails in overall compute capacity, controlling only about 15% of global resources versus the U.S.'s 75%, according to analyses emphasizing hardware constraints. These semiconductor restrictions, including Dutch alignment on ASML EUV bans since 2019, causally sustain the U.S. advantage by denying China tools for sub-7nm nodes critical to supercomputing density, while fostering parallel hardware ecosystems that risk long-term global fragmentation in standards and interoperability. China's opaque reporting—opting out of full TOP500 participation—further obscures verifiable gaps, but empirical data from submitted systems indicate persistent deficits in sustained performance and scale.

Private Sector Dynamics and Export Restrictions

Private companies have increasingly driven supercomputing advancements, particularly for AI training, through rapid deployment of massive GPU clusters unconstrained by traditional government procurement timelines. xAI's Colossus supercomputer in , exemplifies this agility: constructed in 122 days starting in 2024, it initially comprised 100,000 GPUs, expanding to 230,000 by mid-2025, enabling it to become the world's largest AI training system at the time. Similarly, operates frontier supercomputing clusters, leveraging partnerships such as a $100 billion commitment for multi-gigawatt data centers with millions of GPUs, contributing to the private sector's dominance in global AI compute capacity, which reached 80% by 2025. 's DGX Spark, released in October 2025, further democratizes access by packaging Grace Blackwell architecture into a desktop-form AI supercomputer capable of handling models up to 200 billion parameters with 1 petaflop of FP4 performance, targeting developers and researchers. These market-driven efforts contrast with U.S. export restrictions, enforced by the (BIS), which limit transfers of advanced computing items and supercomputing technologies to entities posing risks, particularly in . The Entity List, expanded in 2025 with additions like 42 Chinese entities in March and 23 in September, requires licenses for high-performance semiconductors and prohibits exports supporting military modernization, including supercomputer components. Proponents argue these measures enhance U.S. security by curbing adversaries' capabilities in AI-enabled warfare and , as evidenced by controls targeting supercomputing for PRC military programs. Critics contend restrictions may impede and slow broader technological progress, yet empirical data indicates minimal detriment to U.S. : a 2024 of 30 leading firms found no hindrance to R&D output post-controls, with U.S. private investments surging, such as a $500 billion commitment announced in 2025. While government subsidies via acts like can distort resource allocation, private sector adaptability—demonstrated by xAI's Colossus breakthroughs in rapid scaling—has sustained U.S. leadership, enabling faster iteration than state-directed models elsewhere.

Controversies and Counterarguments

Fiscal and Opportunity Costs

The development of exascale supercomputers typically requires investments exceeding $500 million per system, as evidenced by the U.S. Department of Energy's Frontier supercomputer at Oak Ridge National Laboratory, which cost $600 million to procure and deploy in 2022. Similarly, Europe's Jupiter exascale system, operational in 2025, carried a price tag of approximately €500 million, including initial operations, funded through the EuroHPC Joint Undertaking with contributions split between the EU and member states. These figures encompass hardware, integration, and early operational expenses but exclude ongoing power and maintenance costs, which can add tens of millions annually due to high energy demands. Private sector initiatives demonstrate contrasting fiscal efficiency, with xAI's Colossus cluster in achieving rapid deployment—initial phases operational within months of announcement in mid-2024—at an estimated $4 billion for the first stage, scaled via commercial GPU purchases without equivalent public subsidies. This approach highlights opportunity costs in government-led projects, where bureaucratic and international collaboration often extend timelines; for instance, exascale efforts lagged U.S. counterparts by several years despite comparable per-system budgets, attributing delays to dependencies and funding coordination. Critics argue that such expenditures divert resources from immediate societal needs like poverty alleviation or basic infrastructure, positing supercomputing as a luxury amid fiscal constraints. However, empirical analyses counter this by quantifying high returns: a Hyperion Research study found that every $1 invested in yields $44 in downstream profits through innovations in industries like and pharmaceuticals, while a Finnish CSC evaluation reported €25-37 in societal benefits per euro invested, encompassing scientific advancements and economic multipliers. Proponents emphasize these systems' role in securing technological leadership, where forgoing investment risks ceding ground in compute-intensive fields like and , potentially amplifying long-term opportunity costs through lost competitiveness. Government projects, while enabling broad access, incur overruns from delays—such as Europe's deferred exascale milestones—contrasting private ventures' agility in iterating at market-driven paces.

Environmental Assertions Versus Data

Critics of supercomputer deployments frequently highlight localized environmental impacts, such as the allegations surrounding xAI's Colossus facility in , where over 30 unpermitted gas turbines were initially operated to meet power demands, prompting lawsuits from groups like the over potential and health risks in nearby communities. Such assertions often amplify temporary grid and emission strains without accounting for broader causal offsets, including the negligible scale of supercomputing's global footprint: the combined power draw of TOP500-listed systems, totaling around 1-2 gigawatts at peak, equates to under 0.01% of worldwide , yielding emissions far below 0.1% of annual global CO2 output even under average grid carbon intensities. This disparity underscores selective outrage, as supercomputer-driven advancements—like molecular simulations accelerating —yield downstream energy savings by minimizing resource-intensive wet-lab trials and physical prototyping, with AI models reducing development timelines from years to months in cases like predictions. While renewables integration is feasible, as demonstrated by the exascale system in —powered 100% by renewable sources and achieving 60 gigaflops per watt efficiency—it is not a prerequisite for viable supercomputing, given that fossil backups ensure reliability during peak loads without derailing net progress. Community benefits, including thousands of high-tech jobs and infrastructure upgrades in host regions like , often outweigh short-term disruptions, with local utilities affirming minimal long-term grid risks through demand-response adaptations. Claims of enduring strain ignore hardware innovations outpacing regulatory timelines: photonic and microfluidic cooling in next-generation chips have slashed per-operation energy needs by factors of 3-6, while GPU architectures like NVIDIA's deliver sustained efficiency gains, compressing supercomputers' lifecycle footprints faster than incremental policy mandates. These dynamics reveal that alarmist narratives, amplified by advocacy media, overlook empirical trade-offs where compute-enabled efficiencies—such as optimized industrial processes—systematically mitigate upstream consumption.

Security Risks and Ethical Dilemmas

Supercomputers, owing to their vast computational scale and interconnected architectures, present amplified cybersecurity vulnerabilities compared to conventional systems. In 2020, at least a dozen European supercomputers, including those in , , , and , were compromised by attackers seeking to hijack resources for mining, leading to temporary shutdowns and disruptions in scientific research. Similarly, the UK's ARCHER supercomputer suffered a security incident in May 2020, where intruders exploited login nodes, forcing operators to disable external access and halting simulations on modeling and pandemics for several days. These incidents, though infrequent, underscore the potential for catastrophic or resource commandeering, particularly as supercomputers often process sensitive national data; state-sponsored actors, such as those linked to or , have been implicated in broader targeting infrastructure, though direct attributions to supercomputer breaches remain classified or unverified in public reports. The dual-use nature of supercomputing exacerbates ethical tensions, as the same hardware optimized for civilian applications—like for —can simulate complex weapons systems or pathogen engineering. For instance, the U.S. Department of Defense deployed the supercomputer in 2024 at , explicitly for simulations, AI-driven vaccine design, and modeling chemical-biological threats to enhance protective measures and surveillance. However, this capability inherently risks repurposing for offensive bioweapons development, as high-fidelity simulations could accelerate the design of engineered viruses or toxins, a concern amplified by the technology's transferability to non-state actors via stolen code or hardware. Ethical frameworks highlight the challenge of proportionality: while military opacity in classified simulations (e.g., nuclear ) safeguards , it limits civilian oversight and global collaboration, potentially fostering if adversarial nations outpace defensive governance. Debates over computational supremacy further illustrate ethical dilemmas in and hype-driven narratives. Claims of , such as Google's 2019 Sycamore demonstration purporting to outperform classical supercomputers on random circuit sampling, faced immediate challenges from classical simulations achieving comparable results with optimized algorithms on systems like IBM's. More recent assertions, including Google's 2025 algorithm purportedly running 13,000 times faster than supercomputer equivalents on certain tasks, continue to be contested by advances in classical methods and GPU clusters that replicate or approximate quantum outputs without exotic hardware, questioning the practical exclusivity of quantum advantages. This underscores a broader ethical imperative for empirical validation over promotional benchmarks, as overhyping shifts diverts funding from scalable classical supercomputing, which remains indispensable for verifiable, energy-efficient simulations in defense and , provided prioritizes national sovereignty over unsubstantiated internationalist ideals.

Recent Advances and Future Trajectories

Milestones Post-2020 (e.g., El Capitan Era)

The Frontier supercomputer at Oak Ridge National Laboratory achieved the first verified exascale performance milestone on May 30, 2022, with a High-Performance Linpack (HPL) score of 1.102 exaflops, surpassing the exascale threshold of one quintillion floating-point operations per second. Built by Hewlett Packard Enterprise for the U.S. Department of Energy, Frontier's peak performance reaches 1.7 exaflops using AMD processors and GPUs, enabling advancements in simulations for climate modeling, materials science, and nuclear stockpile stewardship amid U.S. geopolitical priorities in computational sovereignty. By November 2024, it had improved to 1.35 exaflops HPL while retaining the second position on the TOP500 list. El Capitan, deployed at Lawrence Livermore National Laboratory, assumed the top TOP500 ranking in November 2024 as the third exascale system, with an HPL performance exceeding Frontier's and a focus on national security applications like nuclear weapons simulations. Officially dedicated on January 9, 2025, and powered by AMD Instinct MI300A accelerators integrated with HPE hardware, El Capitan retained its number-one status through the June 2025 TOP500 edition, underscoring U.S. leadership in sustained exascale deployment despite export controls on advanced chips to rivals like China. Academic institutions advanced AI-oriented systems in 2025, with New York University's supercomputer unveiled in October, featuring over 500 H200 GPUs for 10.79 petaflops of performance—five times its predecessor—and ranking 40th on the for energy efficiency. Similarly, Lincoln Laboratory's TX-GAIN, also launched in October 2025, delivers 2 exaflops of compute optimized for generative models, biodefense, and materials discovery, marking it as the most powerful university-based system in the U.S. Private sector initiatives shifted toward massive AI training clusters, exemplified by xAI's Colossus, constructed in 122 days starting in 2024 in , using 100,000 GPUs to form the world's largest AI supercomputer at the time, dedicated to training models and scalable to one million GPUs. NVIDIA's Blackwell architecture, introduced in systems like the GB10 Grace Blackwell Superchip by early 2025, enabled compact petaflop-scale AI prototypes such as Project DIGITS and fueled enterprise AI factories, prioritizing dense GPU interconnects over traditional HPL benchmarks. TOP500 data post-2020 reflects decelerating aggregate performance growth, with total rising from 2.22 exaflops in June 2020 to around 3 exaflops by mid-2025 driven by just three exascale machines, indicating longer doubling times beyond the pre-exascale era's Moore's Law-like . Concurrently, rankings highlight efficiency gains, with -powered systems dominating top spots (e.g., sweeping the top three in 2024) and metrics improving to over 60 gigaflops per watt for leading entries, balancing -driven power demands with liquid cooling and specialized accelerators. These trends align with geopolitical emphases on compute for economic and defense edges, where U.S. firms like supply most high-end systems amid restrictions on technology transfers.

Pathways to Zettascale and Beyond

Efforts to achieve zettascale computing, defined as sustained performance of $10^{21} floating-point operations per second (FLOPS), target deployment in the 2030s through national initiatives like Japan's FugakuNEXT supercomputer, planned for operation around 2030 with ambitions exceeding 1,000 times current exascale capabilities in select metrics. Such projections, echoed in optimistic vendor roadmaps like Intel's 2021 goal for zettascale by 2027, assume aggressive scaling but confront empirical limits from historical performance doublings, which have averaged 2-3x per generation rather than the 10x every five years implied by some plans. U.S. Department of Energy post-exascale systems, such as the planned ATS-5 deployment in 2027, prioritize incremental advances toward this scale but highlight sustainability constraints over rapid leaps. A core barrier is the power wall, intensified by the Dennard scaling breakdown circa 2006, where transistor miniaturization no longer yields proportional voltage reductions, leading to surging power density and total consumption. Exascale prototypes like Frontier operate at around 20-30 megawatts (MW) for 1 exaFLOPS; extrapolating to zettascale without efficiency gains could demand gigawatts, confining practical systems to roughly 100 MW envelopes absent innovations in photonic interconnects for reduced data movement energy or 3D stacking to minimize latency and wiring overhead. Projections for zettascale at 500 MW assume efficiency targets of 2,140 gigaFLOPS per watt, requiring 40-fold improvements over current benchmarks, a trajectory strained by interconnect bottlenecks and thermal limits in dense node architectures. Mitigation strategies emphasize software and architecture specialization, including domain-specific languages to tailor algorithms for hardware idiosyncrasies, thereby extracting higher effective from heterogeneous accelerators without uniform scaling. classical designs integrate these optimizations for compute-bound kernels, prioritizing energy-proportional computing over brute-force parallelism, though roadmaps from and initiatives underscore that such approaches remain unproven at zettascale, with to faults and data movement costs posing additional causal hurdles.

Convergence with Quantum and Neuromorphic Tech

Hybrid quantum-classical supercomputing architectures integrate noisy intermediate-scale quantum (NISQ) processors with classical high-performance computing systems to leverage quantum advantages in targeted subroutines while relying on classical resources for error mitigation and scalability. In August 2025, IBM and AMD announced a collaboration to develop such systems, combining AMD CPUs, GPUs, and FPGAs with IBM quantum processors to handle hybrid workloads, including optimization problems where quantum circuits augment classical solvers. Empirical demonstrations in NISQ hybrids, such as those co-located with supercomputers like Japan's Fugaku, show quantum components accelerating specific simulations but requiring classical preprocessing and post-processing due to qubit decoherence times under milliseconds and gate error rates exceeding 0.1% in current 100-1000 qubit systems. Recent claims of quantum advantage, such as Google's October 2025 announcement of the chip's "Quantum Echoes" achieving a 13,000-fold over the fastest classical supercomputer for a physics task, highlight potential in niche applications like random sampling or error-corrected benchmarks. However, these advantages pertain to contrived or narrowly defined problems; optimized classical algorithms on supercomputers, such as or , have matched or exceeded quantum performance in broader practical tasks like [molecular dynamics](/page/molecular dynamics), underscoring quantum's current confinement to exploratory niches amid persistent limitations from logical error rates necessitating thousands of physical s per reliable logical qubit. Neuromorphic computing, employing spiking neural networks to emulate brain-like event-driven processing, offers energy-efficient augmentation for AI workloads in supercomputing environments, particularly for edge inference or . Intel's Loihi 2 processors enable prototypes like the 2024 Hala Point system, scaling to 1.15 billion neurons with demonstrated efficiency gains of orders of magnitude over GPU-based for small-scale tasks. Yet, these systems operate at scales below 1% of exascale supercomputer counts or synaptic operations per second, limiting integration to accelerators rather than core replacements, as neuromorphic excels in low-power sparsity but lacks the parallelism for sustained high-throughput scientific . Overall, both quantum and neuromorphic technologies serve as specialized co-processors within supercomputing frameworks, enhancing efficiency in domains like or sparse inference without supplanting architectures, constrained by empirical barriers in error resilience, interconnectivity, and thermodynamic scaling.

References

  1. [1]
    Supercomputer - High Performance Computing
    A supercomputer is a high-level performance computer in comparison to a general-purpose computer. Supercomputers are used for computationally intensive tasks ...
  2. [2]
    Supercomputing - Department of Energy
    Supercomputing - also known as high-performance computing - is the use of powerful resources that consist of multiple computer systems working in parallel (i.e ...Missing: definition | Show results with:definition
  3. [3]
    What is High Performance Computing? | U.S. Geological Survey
    A supercomputer is one large computer made up of many smaller computers and processors. Each different computer is called a node. Each node has processors/ ...
  4. [4]
    Timeline of Computer History
    CDC 6600 supercomputer introduced​​ The Control Data Corporation (CDC) 6600 performs up to 3 million instructions per second —three times faster than that of its ...1937 · AI & Robotics (55) · Graphics & Games (48)
  5. [5]
    Supercomputing History: From Early Days to Today | HP® Tech Takes
    Jan 9, 2020 · Cray Supercomputers · Released in 1985 · First supercomputer to use liquid cooling · Performs calculations as fast as 1.9 gigaFLOPS ...
  6. [6]
    TOP500 List - June 2025
    TOP500 List - June 2025 · 1, El Capitan - HPE Cray EX255a, AMD 4th Gen EPYC 24C 1.8GHz, AMD Instinct MI300A, Slingshot-11, TOSS, · 2, Frontier - HPE Cray EX235a, ...
  7. [7]
    El Capitan still the world's fastest supercomputer in Top500 list ...
    Jun 10, 2025 · El Capitan has retained its title as the world's most powerful supercomputer in the 65th edition of the Top500 list.
  8. [8]
    The 9 most powerful supercomputers in the world right now
    and the planet's third-ever exascale machine — after coming online ...
  9. [9]
    What are supercomputers and why are they important
    Jan 19, 2023 · Supercomputing systems have already helped scientists overcome tough challenges, like isolating and identifying the spike protein in the COVID- ...
  10. [10]
    Supercharging Science with Supercomputers - NSF Impacts
    In today's fast-paced, data-driven world, computational power is key to developing life-saving drugs, predicting hurricanes and transforming countless other ...
  11. [11]
    What is a Supercomputer? | Definition from TechTarget
    Feb 11, 2025 · FLOPS are used in supercomputers to measure performance and are considered a more appropriate metric than MIPS due to their ability to provide ...Missing: thresholds | Show results with:thresholds
  12. [12]
    TOP500: Home -
    The 65th edition of the TOP500 showed that the El Capitan system retains the No. 1 position. With El Capitan, Frontier, and Aurora, there are now 3 Exascale ...Lists · June 2018 · November 2018 · TOP500 List
  13. [13]
    2 Explanation of Supercomputing | Getting Up to Speed
    “Supercomputer” refers to computing systems (hardware, systems software, and applications software) that provide close to the best currently achievable ...Missing: attributes | Show results with:attributes
  14. [14]
    1.1 Parallelism and Computing - Mathematics and Computer Science
    A parallel computer is a set of processors that are able to work cooperatively to solve a computational problem.
  15. [15]
    What Is a Supercomputer and How Does It Work? - Built In
    A supercomputer's high-level of performance is measured by floating-point operations per second (FLOPS), a unit that indicates how many arithmetic problems a ...Supercomputer Definition · Supercomputers Vs... · Supercomputers Vs. Quantum...Missing: attributes | Show results with:attributes<|separator|>
  16. [16]
    What Is Supercomputing? - IBM
    Supercomputing is a form of high-performance computing that determines or calculates by using a powerful computer, reducing overall time to solution.
  17. [17]
    [PDF] High Performance Interconnect Technologies for Supercomputing
    Feb 19, 2024 · This survey investigates current popular interconnect topologies driving the most powerful supercomputers. High-Performance Computing (HPC) ...
  18. [18]
    [PDF] Fault tolerance techniques for high-performance computing
    Designing a fault-tolerant system can be done at different levels of the software stack. We call general- purpose the approaches that detect and correct the ...
  19. [19]
    New Approach to Fault Tolerance Means More Efficient High ...
    Mar 30, 2021 · This approach involves building procedures for detecting faults and correcting errors that are specific for particular numerical algorithms. The ...
  20. [20]
    Massively Parallel Computing - an overview | ScienceDirect Topics
    The possibility of working on each sequence independently makes data parallel approaches resulting in high scalability and performance figures for many ...
  21. [21]
    Supercomputer - an overview | ScienceDirect Topics
    Supercomputers are defined as the largest and most powerful computers, capable of performing rapid calculations and requiring specialized environments and ...
  22. [22]
    Cloud Computing vs High Performance Computing (HPC)
    Aug 11, 2025 · HPC: Ultra-low latency interconnects like InfiniBand HDR/NDR/ XDR (200Gbps, 400Gbps, 800Gbps+) are the gold standard. These require high ...
  23. [23]
    HPC vs. Regular Computing: The Crucial Differences Everyone ...
    The high-bandwidth, low-latency network interconnects in HPC systems ensure that this inter-node communication is efficient, minimizing overhead and allowing ...Missing: supercomputers | Show results with:supercomputers
  24. [24]
    What is the difference between a Cluster and MPP supercomputer ...
    Apr 6, 2011 · Compared to a cluster, a modern MPP (such as the IBM Blue Gene) is more tightly-integrated: individual nodes cannot run on their own and they ...
  25. [25]
    Supercharge CFD Simulations With a Supercomputer | Diabatix
    Nov 25, 2022 · Their bare-metal system equipped results in low-latency interconnect, and their compute nodes are tailored for CAE workloads. We further ...
  26. [26]
    Experience and Analysis of Scalable High-Fidelity Computational ...
    May 9, 2024 · In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain ...
  27. [27]
    [PDF] Vetrei - FUN3D - NASA
    These systems are large, tightly-coupled computers with high bandwidth and low latency interconnects with an optimized message-passing library, such as MPI ...
  28. [28]
    How AI and Accelerated Computing Are Driving Energy Efficiency
    Jul 22, 2024 · As a result, it consumes less energy than general-purpose servers that employ CPUs built to handle one task at a time. That's why accelerated ...
  29. [29]
    Understanding the Total Cost of Ownership in HPC and AI Systems
    Aug 22, 2024 · Understanding and calculating TCO is vital for organizations investing in HPC, AI, and advanced computing resources.
  30. [30]
    On-Premise vs Cloud: Generative AI Total Cost of Ownership
    May 23, 2025 · This paper presents a total cost of ownership (TCO) analysis, focusing on AI/ML use cases such as Large Language Models (LLMs), where infrastructure costs are ...
  31. [31]
    What Is High-Performance Computing (HPC)? - IBM
    Unlike mainframes, supercomputers are much faster and can run billions of floating-point operations in one second. Supercomputers are still with us; the fastest ...
  32. [32]
    ENIAC - CHM Revolution - Computer History Museum
    ENIAC (Electronic Numerical Integrator And Computer), built between 1943 and 1945—the first large-scale computer to run at electronic speed without being slowed ...Missing: performance FLOPS proto-<|separator|>
  33. [33]
    The incredible evolution of supercomputers' powers, from 1946 to ...
    Apr 22, 2017 · In 1946, ENIAC, the first (nonsuper) computer, processed about 500 FLOPS (calculations per second). Today's supers crunch petaFLOPS—or 1000 ...
  34. [34]
    CDC 6600 is introduced - Event - The Centre for Computing History
    Between 1964 and 1969 the CDC 6600 was the world's fastest computer, with performance of up to three megaFLOPS. The first machine was delivered to Lawrence ...Missing: 3 MFLOPS
  35. [35]
    CDC 6600 | Computational and Information Systems Lab
    The CDC 6600 is arguably the first supercomputer. It had the fastest clock speed for its day: 100 nanoseconds.Missing: MFLOPS | Show results with:MFLOPS
  36. [36]
    A History of Supercomputers | Extremetech
    Jan 11, 2025 · From the CDC 6600 to Seymour Cray and beyond, supercomputers dominated science, industrial, and military research for decades.
  37. [37]
    Cray History - Supercomputers Inspired by Curiosity - Seymour Cray
    TECH STORY: Cray Research achieved the Cray-1's record-breaking 160 megaflops performance through its small size and cylindrical shape, 1 million-word ...
  38. [38]
    [PDF] LIBIItttlY - NASA Technical Reports Server (NTRS)
    the maximum speed on the Cray-1 is 160 MFLOPS for addition and multiplication running concurrently. On the X-MP, this figure increases to. 210 MFLOPS per ...
  39. [39]
    Future of supercomputing - ScienceDirect.com
    As shown in the previous section, the first-half of the 1990s is characterized by the shift from vector computers to parallel computers based on COTS (Commodity ...
  40. [40]
    Vectors: How the Old Became New Again in Supercomputing
    Sep 26, 2016 · Vector instructions, once a powerful performance innovation of supercomputing in the 1970s and 1980s became an obsolete technology in the 1990s.
  41. [41]
    25 Year Anniversary | TOP500
    Intel's ASCI Red supercomputer was the first teraflop/s computer, taking the No.1 spot on the 9th TOP500 list in June 1997 with a Linpack performance of 1.068 ...
  42. [42]
    [PDF] THE FUTURE OF SUPERCOMPUTING
    by higher performance than mainstream computing. However, as the price of computing has dropped, the cost/performance gap between mainstream computers and ...
  43. [43]
    Computer Organization | Amdahl's law and its proof - GeeksforGeeks
    Aug 21, 2025 · Amdahl's Law, proposed by Gene Amdahl in 1967, explains the theoretical speedup of a program when part of it is improved or parallelized.
  44. [44]
    [PDF] Overview of the Blue Gene/L system architecture
    Apr 7, 2005 · It is designed to scale to 65,536 dual-processor nodes, with a peak performance of 360 teraflops.
  45. [45]
    [PDF] Blue Gene/L Architecture
    Jun 2, 2004 · June 2, 2004: 2 racks DD2 (1024 nodes at 700 MHz) running Linpack at 8.655 TFlops/s. This would displace #5 on 22nd Top500 list. Page 5. Blue ...
  46. [46]
    China Benchmarks World's Fastest Super: 2.5 Petaflops Powered by ...
    Oct 27, 2010 · […] China announced that their new Tianhe-1A super computer has set a new performance record of 2.507 petaflops on the […] Current “fastest” ...
  47. [47]
    [PDF] A large-scale study of failures in high-performance computing systems
    Root causes fall in one of the follow- ing five high-level categories: Human error; Environment, including power outages or A/C failures; Network failure;.Missing: supercomputers 2010s
  48. [48]
    Job failures in high performance computing systems: A large-scale ...
    Existing works of failure analysis often miss the study of probing to inherent common characteristics of failures, which could be used to identify a potential ...
  49. [49]
    Frontier supercomputer hits new highs in third year of exascale | ORNL
    Nov 18, 2024 · The Frontier supercomputer took the No. 2 spot on the November 2024 TOP500 list, which ranks the world's fastest supercomputers.
  50. [50]
    Frontier - Oak Ridge Leadership Computing Facility
    Exascale is the next level of computing performance. By solving calculations five times faster than today's top supercomputers—exceeding a quintillion, or 1018, ...
  51. [51]
    Aurora Exascale Supercomputer - Argonne National Laboratory
    Aurora is one of the world's first exascale supercomputers, able to perform over a quintillion calculations per second. Housed at the Argonne Leadership ...Aurora · Aurora by the Numbers · Argonne’s Aurora... · Aurora Early Science
  52. [52]
    El Capitan Retains Top Spot in 65th TOP500 List as Exascale Era ...
    The 65th edition of the TOP500 showed that the El Capitan system retains the No. 1 position. With El Capitan, Frontier, and Aurora, there are now 3 Exascale ...
  53. [53]
    Europe enters the exascale supercomputing league with 'JUPITER'
    Sep 4, 2025 · Officially ranked as Europe's most powerful supercomputer and the fourth fastest worldwide, JUPITER combines unmatched performance with a strong ...
  54. [54]
    Performance Development | TOP500
    List Statistics · Treemaps · Development over Time · Efficiency, Power ... Performance Development. Performance Development Sum #1 #500.Missing: minimum threshold entry level
  55. [55]
    Colossus | xAI
    We doubled our compute at an unprecedented rate, with a roadmap to 1M GPUs. Progress in AI is driven by compute and no one has come close to building at this ...
  56. [56]
    NVIDIA Ethernet Networking Accelerates World's Largest AI ...
    Oct 28, 2024 · The NVIDIA Spectrum-X Ethernet networking platform is designed to provide innovators such as xAI with faster processing, analysis and execution of AI workloads.
  57. [57]
    [PDF] Frontier Architecture Overview
    Feb 28, 2024 · Frontier uses HPE Cray EX architecture with 9408 nodes, 3rd Gen AMD EPYC CPUs, 4 AMD Instinct MI250X GPUs, and HPE Slingshot interconnect. Each ...
  58. [58]
    FUJITSU Processor A64FX
    The A64FX is a top-level processor with 48 calculation cores, SVE, 3.3792 teraflops peak performance, 7nm process, and 2.5D packaging for power efficiency.
  59. [59]
    Fujitsu A64FX: Arm-powered Heart of World's Fastest Supercomputer
    Jul 10, 2020 · Add it all up, and the Fugaku supercomputer consists of 432 racks with a total of 158,976 Fujitsu A64FX processors and 8 million Arm cores. It's ...
  60. [60]
    Frontier - Oak Ridge Leadership Computing Facility
    1 64-core AMD “Optimized 3rd Gen EPYC” CPU 4 AMD Instinct MI250X GPUs. GPU Architecture: AMD Instinct MI250X GPUs, each feature 2 Graphics Compute Dies (GCDs) ...
  61. [61]
    World's First Exascale Supercomputer Powered by AMD EPYC ...
    May 30, 2022 · Frontier supercomputer, powered by AMD EPYC CPUs and AMD Instinct Accelerators, achieves number one spots on Top500, Green500 and HPL-AI performance lists.Missing: architecture | Show results with:architecture
  62. [62]
    Single Instruction Multiple Data - an overview | ScienceDirect Topics
    SIMD, or single instruction, multiple data, is defined as a type of vector operation that allows the same instruction to be applied to multiple data items ...
  63. [63]
    Explainer: What Are Tensor Cores? - TechSpot
    Jul 27, 2020 · Known as tensor cores, these mysterious units can be found in thousands of desktop PCs, laptops, workstations, and data centers around the world.
  64. [64]
  65. [65]
    Tradeoffs To Improve Performance, Lower Power
    Mar 11, 2021 · There is always a tradeoff between having an accelerator be programmable and extracting the greatest performance and efficiency. GPUs, TPUs, and ...
  66. [66]
    Highlights - June 2025 - TOP500
    A total of 237 systems on the list are using accelerator/co-processor technology, up from 210 six months ago. 82 of these use 18 chips, 68 use NVIDIA Ampere, ...
  67. [67]
    The Captain Has Crossed the Frontier - HPCwire
    Nov 18, 2024 · The HPE Cray EX architecture combines 3rd Gen AMD EPYC™ CPUs optimized for HPC and AI with AMD Instinct™ 250X accelerators and a Slingshot-11 ...
  68. [68]
    Bringing HPE Slingshot 11 support to Open MPI
    Oct 10, 2024 · The Cray HPE Slingshot 11 network is used on the new exascale systems arriving at the U.S. Department of Energy (DoE) laboratories (e.g., ...
  69. [69]
    Lawrence Livermore National Laboratory's El Capitan verified as ...
    Nov 18, 2024 · El Capitan is the fastest computing system ever benchmarked. The system has a total peak performance of 2.79 exaFLOPs.
  70. [70]
    XSEDE Welcomes New Service Providers - HPCwire
    Jan 7, 2021 · FASTER will have HDR InfiniBand interconnection and access/share a 5PB usable high-performance storage system running Lustre filesystem. 30 ...
  71. [71]
    [PDF] Bandwidth-optimal All-to-all Exchanges in Fat Tree Networks
    Jun 10, 2013 · bisection of this topology. Thus, intuitively, all-to-all ex- changes require only half bisection bandwidth for arbitrary topologies. The ...
  72. [72]
    [PDF] Lecture 29: Network interconnect topologies - Edgar Solomonik
    Dec 7, 2016 · Fat-tree network topology. Fat-tree bisection bandwidth. Fat-trees can be specified differently depending on the desired properties to achieve ...
  73. [73]
    Scaling - HPC Wiki
    Jul 19, 2024 · In the most general sense, scalability is defined as the ability to handle more work as the size of the computer or application grows.
  74. [74]
    Explained: Amdahl's and Gustafson's Law; Weak vs Strong scaling
    Oct 30, 2023 · At its core, scalability refers to the capacity of a system or application to efficiently manage increased workloads as its size expands. In ...
  75. [75]
    Optical interconnects for extreme scale computing systems
    We review some important aspects of photonics that should not be underestimated in order to truly reap the benefits of cost and power reduction. Introduction.
  76. [76]
    [PDF] A Large-Scale Study of Failures on Petascale Supercomputers - JCST
    This study analyzes the source of failures on two typical petascale supercomputers called Sunway BlueLight (based on multi-core CPUs) and Sunway TaihuLight ( ...
  77. [77]
    [PDF] An Investigation into Reliability, Availability, and Serviceability (RAS ...
    A study has been completed into the RAS features necessary for Massively Parallel Processor (MPP) systems. As part of this research, a use case model was built ...
  78. [78]
    Anton 3 | PSC
    Anton 3: twenty microseconds of molecular dynamics ... molecular dynamics simulations roughly 100 times faster than any other general-purpose supercomputer.
  79. [79]
    Anton 3: Twenty Microseconds of Molecular Dynamics Simulation ...
    This speedup means that a 512-node Anton 3 simulates a million atoms at over 100 microseconds per day. Furthermore, Anton 3 attains this performance while ...
  80. [80]
    Quantifying the performance of the TPU, our first machine learning ...
    Apr 5, 2017 · On our production AI workloads that utilize neural network inference, the TPU is 15x to 30x faster than contemporary GPUs and CPUs. The TPU also ...Missing: clusters | Show results with:clusters
  81. [81]
    [PDF] The Decline of Computers as a General Purpose Technology
    Nov 5, 2018 · In each of these cases, specialized processors perform better because different trade-offs can be made to tailor the hardware to the calculation ...<|separator|>
  82. [82]
    How Supercomputers Are Changing Biology | by Macromoltek, Inc.
    Aug 26, 2021 · There's an almost universal tradeoff between speed and generality that even supercomputers must face. While general-purpose supercomputers ...
  83. [83]
    The Linpack Benchmark - TOP500
    The benchmark used in the LINPACK Benchmark is to solve a dense system of linear equations. For the TOP500, we used that version of the benchmark.Missing: FLOPS | Show results with:FLOPS
  84. [84]
    Top500 Supercomputers: Who Gets The Most Out Of Peak ...
    Nov 13, 2023 · ... HPL as a sole performance metric for comparing supercomputers. That said, we note that at 55.3 percent of peak, the HPL run on the new ...
  85. [85]
    HPCG Benchmark
    HPCG is intended as a complement to the High Performance LINPACK (HPL) benchmark, currently used to rank the TOP500 computing systems.HPCG Software Releases · HPCG Overview · HPCG Publications · FAQ
  86. [86]
    The High-Performance Conjugate Gradients Benchmark - SIAM.org
    Jan 29, 2018 · The performance levels of HPCG are far below those seen by HPL. This should not be surprising to those in the high-end and supercomputing ...
  87. [87]
    Benchmark MLPerf Training: HPC | MLCommons V2.0 Results
    The MLPerf HPC benchmark suite measures how fast systems can train models to a target quality metric using V2.0 results.Results · Benchmarks · Scenarios & Metrics
  88. [88]
    [PDF] Supercomputer Benchmarks ! A comparison of HPL, HPCG ... - HLRS
    ❖ HPL sometimes produces rankings contrary to our intuition. ❖ Too easy to build stunt machines: ▫ Achieve high Linpack. ▫ Are not good for much ...
  89. [89]
    Memory Bandwidth and Machine Balance - Computer Science
    This report presents a survey of the memory bandwidth and machine balance on a variety of currently available machines.
  90. [90]
    About | TOP500
    The TOP500 project was launched in 1993 to improve and renew the Mannheim supercomputer statistics, which had been in use for seven years.Missing: history | Show results with:history
  91. [91]
    June 2025 - TOP500
    The 65th edition of the TOP500 showed that the El Capitan system retains the No. 1 position. With El Capitan, Frontier, and Aurora, there are now 3 Exascale ...
  92. [92]
    TOP500: El Capitan Stays on Top, US Holds Top 3 Supercomputers ...
    Jun 10, 2025 · The new TOP500 list of the world's most powerful supercomputers, released this morning at the ISC 2025 conference in Germany, shows an expanding European ...
  93. [93]
    Top500: China Opts Out of Global Supercomputer Race
    May 13, 2024 · The Top500 list recognizes 500 of the world's fastest computers based on benchmarks stipulated by the organization. The Top500 list is highly ...
  94. [94]
    [PDF] The TOP500 List and Progress in High- Performance Computing
    Nov 2, 2015 · The TOP500 is often criticized because the published performance num- bers for Linpack are far lower than what is achievable for actual applica ...Missing: Critiques | Show results with:Critiques
  95. [95]
    The changing face of supercomputing: why traditional benchmarks ...
    Sep 25, 2025 · The TOP500 originally launched as a simple but revolutionary idea in 1993: rank supercomputers by their performance on a standardised benchmark, ...<|separator|>
  96. [96]
    Looking Beyond Linpack: New Supercomputing Benchmark in the ...
    Jul 24, 2013 · With so much emphasis and funding invested in the Top500 rankings, the 20-year old Linpack benchmark has come under scrutiny, with some in the ...Missing: bias | Show results with:bias
  97. [97]
    Pros and Cons of HPCx benchmarks - SC18
    The most important criticism is that HPL measures only the peak floating point performance and its result has little correlation with real application ...
  98. [98]
    [PDF] Co-design of Advanced Architectures for Graph Analytics using ...
    Instead of a computation-intensive benchmark like the High Performance. Linpack (HPL) [28], the Graph500 is focused on data-intensive workloads [24]. We used ...
  99. [99]
    Automated Tuning of HPL Benchmark Parameters for Supercomputers
    This research presents an automated tuning approach for optimizing parameters of the High-Performance Linpack (HPL) benchmark, which is crucial for assessing ...
  100. [100]
    [PDF] High Performance Computing Instrumentation and Research ...
    Abstract. This paper studies the relationship between investments in High-Performance. Computing (HPC) instrumentation and research competitiveness.<|separator|>
  101. [101]
    An HPC Benchmark Survey and Taxonomy for Characterization - arXiv
    Sep 10, 2025 · Some benchmarks are collected into benchmark suites, typically created for system procurements, to replicate a desired measurement and workload ...
  102. [102]
    Cray -1 super computer: The power supply - EDN Network
    Apr 18, 2013 · The machine and its power supplies consumed about 115 kW of power; cooling and storage likely more than doubled this figure.
  103. [103]
    The Beating Heart of the World's First Exascale Supercomputer
    Jun 24, 2022 · The lab says its world-leading supercomputer consumes about 21 megawatts. “Everyone up and down the line went after efficiency.”
  104. [104]
    A Global Perspective on Supercomputer Power Provisioning: Case ...
    Aug 22, 2025 · In the histogram, the median power consumption was 2.888 MW and the maximum power consumption was 4.301 MW. Note that the finer grained dataset ...
  105. [105]
    Energy dataset of Frontier supercomputer for waste heat recovery
    Oct 3, 2024 · Frontier, despite its efficient design, consumes between 8 and 30 MW of electricity—equivalent to the energy consumption of several thousand ...
  106. [106]
    Biological computers could use far less energy than current ...
    Feb 2, 2025 · A 2023 paper that I co-authored showed that a computer could then operate near the Landauer limit, using orders of magnitude less energy than today's computers.
  107. [107]
    Frontier to Meet 20MW Exascale Power Target Set by DARPA in 2008
    Jul 14, 2021 · Frontier is poised to hit the 20 MW power goal set by DARPA in 2008 by delivering more than 1.5 peak exaflops of performance inside a 29 MW power envelope.Missing: EFLOPS | Show results with:EFLOPS
  108. [108]
    Laying the Groundwork for Extreme-Scale Computing
    A Supercomputing Power Boost. DOE's target for exascale machine power is 20 megawatts or less—a number aimed at balancing operating costs with computing ...Missing: EFLOPS | Show results with:EFLOPS<|separator|>
  109. [109]
    Power requirements of leading AI supercomputers have doubled ...
    Jun 5, 2025 · In January 2019, Summit at Oak Ridge National Lab had the highest power capacity of any AI supercomputer at 13 MW. Today, xAI's Colossus ...
  110. [110]
    Which Liquid Cooling Is Right for You? Immersion and Direct-to ...
    May 6, 2025 · The two main categories of liquid cooling are immersion and direct-to-chip, and each has a single-phase and two-phase option.
  111. [111]
    Purdue Researchers Hit DARPA Cooling Target of 1000W/cm^2
    Oct 24, 2017 · Now, a group of researchers from Purdue University have devised an 'intra-chip' cooling technique that hits the 1000-watt per square centimeter ...
  112. [112]
    Data centers take the plunge - C&EN - American Chemical Society
    Aug 7, 2025 · Two-phase cooling immerses the circuits in fluorinated refrigerants that have boiling points of around 50 °C. The system takes advantage of the ...
  113. [113]
    Energy Consumption in Data Centers: Air versus Liquid Cooling
    Jul 28, 2023 · McKinsey and Company estimates that cooling accounts for nearly 40% of the total energy consumed by data centers.
  114. [114]
    High-Performance Computing Data Center Power Usage ... - NREL
    Apr 10, 2025 · Data centers focusing on efficiency typically achieve PUE values of 1.2 or less. PUE is the ratio of the total amount of power used by a ...
  115. [115]
    Liquid cooling leak damages millions of dollars in GPUs - Tech Stories
    Sep 25, 2025 · Overhead pipe mishap in Southeast Asia floods data centre aisle, proving liquid cooling's biggest fear.Missing: supercomputer PUE Summit
  116. [116]
    Microsoft finds underwater datacenters are reliable, practical and ...
    Sep 14, 2020 · The concept was considered a potential way to provide lightning-quick cloud services to coastal populations and save energy.Missing: savings | Show results with:savings
  117. [117]
    Current Cooling Limitations Slowing AI Data Center Growth - AIRSYS
    Sep 23, 2025 · Rack densities of 50-100kW are becoming the norm, and chip-level heat generation is hitting record highs. At scale, this compounds into massive ...Missing: supercomputer | Show results with:supercomputer
  118. [118]
  119. [119]
    [PDF] 2024 United States Data Center Energy Usage Report
    Dec 17, 2024 · This annual energy use also represents 6.7% to 12.0% of total U.S. electricity consumption forecasted for 2028.
  120. [120]
    The Cloud now has a greater carbon footprint than the airline industry
    Apr 30, 2024 · The airline industry currently accounts for 2.5% of the world's carbon emissions, while The Cloud accounts for somewhere between 2.5% to 3.7%.
  121. [121]
    [PDF] Analysis of the carbon footprint of HPC - HAL
    Sep 15, 2025 · 13 An equivalent to Moore's law for the energy efficiency trend. He observed that the number of computations per joule of energy roughly ...
  122. [122]
    General Atomics Scientists Leverage DOE Supercomputers to ...
    Aug 10, 2022 · These simulations allow researchers to test theories and design more effective experiments on devices like the DIII-D National Fusion Facility.
  123. [123]
    Harnessing Supercomputing Power for Drug Discovery - InventUM
    Jul 14, 2025 · Dr. Stephan Schürer's lab performed simulations necessary for creating drugs up to 10 times faster than with standard methods.
  124. [124]
    xAI Colossus - Supermicro
    Leading Liquid-Cooled AI Cluster · Generative AI SuperCluster With 256 NVIDIA HGX™ H100/H200 GPUs, 32 4U Liquid-cooled Systems · Inside the 100K GPU xAI Colossus ...
  125. [125]
    Energy efficiency trends in HPC: what high-energy and ... - Frontiers
    The growing energy demands of High Performance Computing (HPC) systems have made energy efficiency a critical concern for system developers and operators.Missing: obsolescence | Show results with:obsolescence
  126. [126]
    Operating system Family / Linux - TOP500
    The content of the TOP500 list for June 2024 is still subject to change until the publication of the list until 11:00 am CEST (05:00 am EDT) Tuesday, June 10, ...
  127. [127]
    Transparent Hugepage Support - The Linux Kernel documentation
    Transparent HugePage Support (THP) is an alternative mean of using huge pages for the backing of virtual memory with huge pages.
  128. [128]
    7.3. Configuring HugeTLB Huge Pages | Performance Tuning Guide
    In a NUMA system, huge pages assigned with this parameter are divided equally between nodes. You can assign huge pages to specific nodes at runtime by changing ...
  129. [129]
    Slurm Workload Manager: Efficient Cluster Management - GigaIO
    Slurm is the workload manager on about 60% of the TOP500 supercomputers around the world. It is designed to be highly efficient and fault-tolerant.
  130. [130]
    Overview - Slurm Workload Manager - SchedMD
    Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.Missing: supercomputers | Show results with:supercomputers
  131. [131]
    Introduction to Slurm-The Backbone of HPC - Rafay
    Jun 23, 2025 · The Slurm scheduler can handle immense scale and has been battle tested on massive supercomputers. Handling ~10,000 nodes with 100s of jobs/ ...<|separator|>
  132. [132]
    Making a Case for Efficient Supercomputing - ACM Queue
    Dec 5, 2003 · I argue that efficiency, reliability, and availability will become the dominant issues by the end of this decade, not only for supercomputing, but also for ...
  133. [133]
    Singularity Containers Improve Reproducibility and Ease of Use in ...
    This presents an issue on High-Performance Computing (HPC) clusters required for advanced image analysis workflows as most users do not have root access.
  134. [134]
    Singularity to deploy HPC applications: a study case with WRF
    Jan 28, 2025 · Singularity introduces 11-15% performance overhead but offers portability and reproducibility benefits, and near-native performance for HPC ...
  135. [135]
    Unicos and other operating systems - Cray-History.net
    Aug 14, 2021 · the service elements run SUSE Linux. Cray Linux Environment (CLE): from release 2.1 onwards, UNICOS/lc is now called Cray Linux Environment.
  136. [136]
    Specifications - OpenMP
    Sep 15, 2025 · OpenMP API 6.0 Specification – Nov 2024: PDF download (Full specification); Amazon: Softcover book, Vol. 1 (Definitions, Directives and ...Missing: date | Show results with:date
  137. [137]
    Berkeley Unified Parallel C (UPC) Project
    The UPC language evolved from experiences with three other earlier languages that proposed parallel extensions to ISO C 99: AC , Split-C, and Parallel C ...Missing: history | Show results with:history
  138. [138]
    NVIDIA, Cray, PGI, CAPS Unveil 'OpenACC' Programming Standard ...
    Nov 13, 2011 · ... OpenACC standard beginning in the first quarter of 2012. The OpenACC standard is fully compatible and interoperable with the NVIDIA® CUDA ...
  139. [139]
    A Deep Dive Into Amdahl's Law and Gustafson's Law | HackerNoon
    Nov 11, 2023 · Discover in detail the background, theory, and usefulness of Amdahl's and Gustafson's laws. We also discuss the strong and weak scaling ...Missing: trade- offs SPMD hybrid
  140. [140]
    [PDF] Hybrid MPI and OpenMP Parallel Programming
    Hybrid Parallel Programming. Parallel Programming Models on Hybrid Platforms. No overlap of. Comm. ... – Remarks on MPI and PGAS (UPC & CAF). 131. • Hybrid ...
  141. [141]
    Scalability: strong and weak scaling – PDC Blog - KTH
    Nov 9, 2018 · If we apply Gustafson's law to the previous example of s = 0.05 and p = 0.95, the scaled speedup will become infinity when infinitely many ...
  142. [142]
    Publications - Legion Programming System - Stanford University
    We present Legion, a programming model and runtime system for achieving high performance on these machines. Legion is organized around logical regions, which ...
  143. [143]
    TotalView Debugger - | HPC @ LLNL
    TotalView is a sophisticated and powerful tool used for debugging and analyzing both serial and parallel programs. TotalView provides source level debugging ...
  144. [144]
    DDT - NERSC Documentation
    DDT is a parallel debugger which can be run with up to 2048 processors. It can be used to debug serial, MPI, OpenMP, OpenACC, Coarray Fortran (CAF), UPC ( ...
  145. [145]
    Perforce TotalView HPC Debugging
    Perforce TotalView is the most advanced debugger for complex Python, Fortran, C, and C++ applications. Discover why.TotalView Downloads · TotalView Student Licenses · GPU Application Debugging
  146. [146]
    TAU - Tuning and Analysis Utilities - - Computer Science
    TAU Performance System is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, UPC, Java, Python.
  147. [147]
    Vampir - | HPC @ LLNL - Lawrence Livermore National Laboratory
    Vampir is a full featured tool suite for analyzing the performance and message passing characteristics of parallel applications.
  148. [148]
    [PDF] Performance and Power Impacts of Autotuning of Kalman Filters for ...
    A speedup of 1.47x is achieved by ATLAS and the tuned linear algebra library when on the ARM machine. Algorithm level tuning of the filter improves this to 1.55 ...<|separator|>
  149. [149]
    What is HIP? - AMD ROCm documentation
    HIP supports the ability to build and run on either AMD GPUs or NVIDIA GPUs. GPU Programmers familiar with NVIDIA CUDA or OpenCL will find the HIP API familiar ...Missing: supercomputers | Show results with:supercomputers
  150. [150]
    GPU-HADVPPM4HIP V1.0: using the heterogeneous-compute ...
    Sep 13, 2024 · The results show that the CUDA and HIP technology to port HADVPPM from the CPU to the GPU can significantly improve its computational ...
  151. [151]
    MLKAPS: Machine Learning and Adaptive Sampling for HPC Kernel ...
    Jan 10, 2025 · This paper presents MLKAPS, a tool that automates this task usingmachine learning and adaptive sampling techniques.Missing: ML- based
  152. [152]
    Apptainer - Portable, Reproducible Containers
    Apptainer (formerly Singularity) simplifies the creation and execution of containers, ensuring software components are encapsulated for portability and ...User Guide · Quick Start · Support · NewsMissing: supercomputing | Show results with:supercomputing
  153. [153]
    Cloud Simulations on Frontier Awarded Gordon Bell Special Prize ...
    Nov 16, 2023 · The Energy Exascale Earth System Model, or E3SM, project's Simple Cloud Resolving E3SM Atmosphere Model puts 40-year climate simulations, a ...
  154. [154]
    Large‐scale inverse model analyses employing fast randomized ...
    Jul 6, 2017 · We have developed a new computationally efficient technique for solving inverse problems with a large number of observations (eg, on the order of 10 7 or ...
  155. [155]
    DOE Awards 38M Node-Hours of Computing Time to ... - HPCwire
    Jul 9, 2025 · The ALCC allocates researchers time on DOE's world-leading supercomputers to advance U.S. leadership in science and technology simulations.
  156. [156]
    GRChombo : Numerical relativity with adaptive mesh refinement
    Dec 3, 2015 · In this work, we introduce GRChombo: a new numerical relativity code which incorporates full adaptive mesh refinement (AMR) using block ...
  157. [157]
    [PDF] GRChombo: An adaptable numerical relativity code for fundamental ...
    The canonical example of this is the simulation of two black holes in orbit around each other, which permits extraction of the gravitational wave signal ...
  158. [158]
    Density functional theory: Its origins, rise to prominence, and future
    Aug 25, 2015 · This paper reviews the development of density-related methods back to the early years of quantum mechanics and follows the breakthrough in their application ...
  159. [159]
    Computational predictions of energy materials using density ...
    The attributes and limitations of DFT for the computational design of materials for lithium-ion batteries, hydrogen production and storage materials, ...
  160. [160]
    Exascale Simulations Underpin Quake-Resistant Infrastructure ...
    Sep 3, 2025 · The simulations reveal in stunning new detail how geological conditions influence earthquake intensity and, in turn, how those complex ground ...
  161. [161]
    Two Decades of High-Performance Computing at SCEC
    Nov 1, 2022 · SCEC's supercomputer allocations from the Department of Energy (DOE) and the National Science Foundation (NSF) over the last twenty years.
  162. [162]
    Department of Energy Awards 18 Million Node-Hours of Computing ...
    Jun 29, 2022 · 18 million node-hours have been awarded to 45 scientific projects under the Advanced Scientific Computing Research (ASCR) Leadership Computing Challenge (ALCC) ...Missing: NSF | Show results with:NSF
  163. [163]
    The Accelerated Strategic Computing Initiative - NCBI - NIH
    The goal of ASCI is to simulate the results of new weapons designs as well as the effects of aging on existing and new designs.
  164. [164]
    [PDF] Accelerated Strategic Computing Initiative (ASCI) Program Plan
    U.S. Department of Energy Defense Programs. Los Alamos National Laboratory ... Distributed Computing will develop an enterprise-wide integrated supercomputing ...
  165. [165]
    On the Path to the Nation's First Exascale Supercomputers
    Jun 15, 2017 · “The PathForward program is critical to the ECP's co-design process, which brings together expertise from diverse sources to address the four ...Missing: deterrence | Show results with:deterrence
  166. [166]
    NNSA and Livermore Lab achieve milestone with El Capitan, the ...
    Dec 10, 2024 · El Capitan as the world's most powerful supercomputer, achieving a groundbreaking 1.742 exaFLOPS (1.742 quintillion floating-point operations or calculations ...<|separator|>
  167. [167]
    El Capitan: NNSA's first exascale machine
    El Capitan's capabilities help researchers ensure the safety, security, and reliability of the nation's nuclear stockpile in the absence of underground testing.
  168. [168]
    Don't Be Fooled, Advanced Chips Are Important for National Security
    Feb 10, 2025 · Advanced chips enable nuclear deterrence, intelligence analysis, and are vital for weapon systems, driving strategic military advantage and ...Missing: SIGINT | Show results with:SIGINT
  169. [169]
    Supercomputers on Demand: Enhancing Defense Operations
    Jan 8, 2025 · Explore how supercomputers on demand enhance defense with scalable, cost-effective solutions for cyber defense, AI, and simulations.
  170. [170]
    [PDF] The History of the Department of Defense High-Performance ... - DTIC
    The Department of Defense (DOD) High-Performance Computing (HPC). Modernization Program (HPCMP) was created on 5 December 1991 when. President George H W Bush ...
  171. [171]
    AFRL's newest supercomputer 'Raider' promises to compute years ...
    Sep 11, 2023 · With modeling and simulation, the DOD can save years' worth of time and money in its laboratories, as the supercomputer allows researchers ...
  172. [172]
    Summary of Progress for the DoD HPCMP Hypersonic Vehicle ...
    Dec 29, 2021 · The DoD established a Hypersonic Vehicle Simulation Institute to improve simulation capabilities, addressing shortcomings in modeling and ...
  173. [173]
    Hypersonic Flight - HPCMP
    ... United States Department of Defense requires them to support hypersonic development programs. ... 2023 DoD High Performance Computing Modernization Program.Missing: supercomputers | Show results with:supercomputers<|control11|><|separator|>
  174. [174]
    [PDF] 2022 ASC Computing Strategy - Department of Energy
    The ASC program underpins the nuclear deterrent by providing simulation capabilities and computational resources to support the entire weapons lifecycle from ...
  175. [175]
    Tracking large-scale AI models - Epoch AI
    Apr 5, 2024 · We present a new dataset tracking AI models with training compute over 1023 floating point operations (FLOP). This corresponds to training ...<|separator|>
  176. [176]
    Distributed Parallel Training: Data Parallelism and Model Parallelism
    Sep 18, 2022 · There are two primary types of distributed parallel training: data parallelism and model parallelism. We further divide the latter into two ...
  177. [177]
  178. [178]
    NVIDIA H100 Tensor Core GPU - Colfax International
    NVIDIA H100 Tensor Core GPU ; FP16 Tensor Core. 1,979 teraFLOPS*. 1,513 teraFLOPS* ; FP8 Tensor Core. 3,958 teraFLOPS*. 3,026 teraFLOPS* ; INT8 Tensor Core. 3,958 ...
  179. [179]
    SC500: Microsoft Now Has the Third Fastest Computer in the World
    Nov 13, 2023 · Microsoft also claimed record GPT-3 training time on Eagle using the MLPerf benchmarking suite. The system trained a GPT-3 LLM generative ...
  180. [180]
    Trends in AI Supercomputers - arXiv
    We create a dataset of 500 AI supercomputers from 2019 to 2025 and analyze key trends in performance, power needs, hardware cost, ownership, and global ...
  181. [181]
    The Global HPC and AI Market, By the Numbers - HPCwire
    Sep 22, 2025 · Hyperion found that the middle of the HPC/AI market was the fastest growing in 2024. Large HPC systems, or those that cost from $1 billion to ...
  182. [182]
    ExxonMobil announces Discovery 6 supercomputer to power oil and ...
    Mar 20, 2025 · ExxonMobil announces Discovery 6 supercomputer to power oil and gas deposit mapping technology ... Oil and gas giant ExxonMobil has unveiled its ...
  183. [183]
    HPE supercomputing capabilities increase ExxonMobil's 4D seismic ...
    Mar 13, 2025 · Researchers use supercomputers, which are purpose-built to handle complex data, to turn sound wave data into detailed 3D images of the earth's ...
  184. [184]
    ExxonMobil sets record in high performance oil and gas reservoir ...
    Feb 16, 2017 · Proprietary software demonstrates record performance using 716,800 computer processors · Reservoir development scenarios can be examined ...
  185. [185]
    High Performance Computing for Financial Services - IBM
    In the past, we have seen banks long rely on Monte Carlo simulations—calculations that can help predict the probability of a variety of outcomes against ...
  186. [186]
    [PDF] Real-World Examples of Supercomputers Used for Economic and ...
    Through modeling and simulation, private sector participants have improved well recovery and reduced failure risk. Type of ROI. Process improvement resulting in ...
  187. [187]
    The Role of High-Performance Computing in Modern Supply Chain ...
    Sep 11, 2024 · High-performance computing helps logistics and distribution by making route planning and logistical simulations more efficient.
  188. [188]
    Private-sector companies own a dominant share of GPU clusters
    Jun 5, 2025 · Private sector's share of AI computing capacity grew from 40% in 2019 to 80% in 2025, outpacing public sector growth. The largest private ...
  189. [189]
    Supercomputers Market Size, Analysis, Share to [2025-2033]
    The global supercomputers market size was USD 7.9 billion in 2024 & is projected to grow from USD 8.66 billion in 2025 to USD 18.03 billion by 2033.
  190. [190]
    High-Throughput Compute - EGI Federation
    With over 1 million cores of installed capacity, EGI can support over 1.6 million computing jobs per day, making it one of the most powerful and versatile ...
  191. [191]
    EGI - Advanced Computing Services for Research
    EGI is an international federation delivering open solutions for advanced computing and data analytics in research and innovation.About · EGI Infrastructure · EGI Solutions · Batch ComputingMissing: power | Show results with:power
  192. [192]
    Folding@home project is crunching data twice as fast as the top ...
    Mar 23, 2020 · It's now cranking out 470 petaflops of number-crunching performance. Like other distributed computing projects, Folding@home draws on the ...
  193. [193]
    2020 in review, and happy new year 2021! - Folding@home
    Jan 5, 2021 · Folding@home became the first exascale computer, having over 5-fold greater performance than the world's fastest supercomputer at the time.
  194. [194]
    What percent of SETI's computing power came from the ... - Quora
    May 17, 2016 · Each is capable of a 6 billion point FFT each second for a total of 1.2 TFLOP or 0.46% of SETI@home. In the next decade, the Breakthrough Listen ...Missing: peak | Show results with:peak
  195. [195]
    [PDF] Volunteer computing: the ultimate cloud - BOINC
    To achieve high throughput, the use of distributed computing, in which jobs are run on networked computers, is often more cost-effective than supercomputing.
  196. [196]
    Methods and mechanisms of security in Grid Computing - IEEE Xplore
    May 4, 2015 · In contrast heterogeneous systems require proper attention has to be given on security of the data due to increasing need of computing, data ...
  197. [197]
    [PDF] Volunteer Computing and Cloud Computing: Opportunities for Synergy
    How many volunteer nodes are equivalent to 1 cloud node? 2.8 active volunteer hosts per 1 cloud node. (Total performance still orders of magnitude better).Missing: efficiency | Show results with:efficiency
  198. [198]
    Volunteer computing: requirements, challenges, and solutions
    Volunteer computing is a form of network based distributed computing, which allows public participants to share their idle computing resources, and helps ...
  199. [199]
    AWS Perfects Cloud Service for Supercomputing Customers
    Aug 29, 2024 · The Parallel Computing Service (PCS) is a managed service offering allowing customers to set up and manage high-performance computing (HPC) clusters.Missing: Azure | Show results with:Azure<|separator|>
  200. [200]
    kjrstory/awesome-cloud-hpc: A curated list of Cloud HPC. - GitHub
    AWS ParallelCluster - Open source cluster management tool for deploying and managing HPC clusters (Repository). AWS ParallelCluster UI - Front-end for AWS ...
  201. [201]
    AWS Parallel Computing Service vs. Azure HPC Comparison
    Compare AWS Parallel Computing Service vs. Azure HPC using this comparison chart. Compare price, features, and reviews of the software side-by-side to make ...Missing: providers | Show results with:providers
  202. [202]
    5 Top Cloud Service Providers in 2025 Compared - DataCamp
    Aug 12, 2025 · Top Cloud Service Providers in 2025 · 1. Amazon Web Services (AWS) · 2. Microsoft Azure · 3. Google Cloud Platform (GCP) · 4. IBM Cloud · 5. Oracle ...
  203. [203]
    Top 12 Cloud GPU Providers for AI and Machine Learning in 2025
    Sep 29, 2025 · This side-by-side comparison breaks down 12 top cloud GPU providers, highlighting key hardware, pricing structures, and standout features.
  204. [204]
    90+ Cloud Computing Statistics: A 2025 Market Snapshot - CloudZero
    May 12, 2025 · Accenture also found that moving workloads to the public cloud leads to Total Cost of Ownership (TCO) savings of 30-40%. ... State Of AI Costs ...
  205. [205]
    Hybrid Cloud Advantages & Disadvantages - IBM
    Business and IT leaders need to review the advantages and disadvantages of hybrid cloud adoption to reap its benefits.Missing: overflow | Show results with:overflow
  206. [206]
    What Is Hybrid Cloud? Use Cases, Pros and Cons - Oracle
    Feb 29, 2024 · Having an on-premises data center plus multiple cloud providers can make total technology cost assessments complicated for a given process.Missing: overflow | Show results with:overflow
  207. [207]
    Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework
    Sep 30, 2025 · Cloud providers like Microsoft, Amazon, and Google report 15–25% year-over-year growth in AI workloads (Kirkpatrick and Newman, 2025; Wheeler, ...
  208. [208]
    Hybrid Cloud Explained: Benefits, Use Cases & Architecture
    Aug 5, 2025 · Hybrid cloud architecture offers flexibility, control, and scalability across on-prem and cloud systems. Is hybrid cloud secure? Yes—sensitive ...Missing: overflow | Show results with:overflow
  209. [209]
    GeoCoded Special Report: State of Global AI Compute (2025 Edition)
    Aug 21, 2025 · This report takes stock of the world's AI computing infrastructure in mid-2025, highlighting who controls the most "digital horsepower," how ...Missing: sponsored | Show results with:sponsored
  210. [210]
    Road to El Capitan 11: Industry investment | Computing
    Nov 13, 2024 · El Capitan will come online in 2024 with the processing power of more than 2 exaflops, or 2 quintillion (10 18 ) calculations per second.
  211. [211]
    Procurement contract for JUPITER, the first European exascale ...
    Oct 3, 2023 · The EuroHPC JU will fund 50% of the total cost of the new machine and the other 50% will be funded in equal parts by the German Federal Ministry ...
  212. [212]
    EuroHPC JU - LUMI supercomputer
    ERDF funding for 4.2 million Euros is granted for the period of 10.06.2019-31.12.2020. This funding enables EuroHPC's supercomputer to be located at CSC's ...
  213. [213]
    Moldova Joins the EuroHPC Joint Undertaking - European Union
    Oct 8, 2025 · ... Joint Undertaking (EuroHPC JU), Moldova became the 37th participating state to join the initiative to lead the way in European supercomputing.Missing: sponsored worldwide
  214. [214]
    EuroHPC JU selects AI Factory Antennas to broaden AI Factories ...
    Oct 13, 2025 · The European Union will fund the AI Factory Antennas with an investment of around €55 million, matched by contributions from the participating ...
  215. [215]
    Nvidia GPUs and Fujitsu Arm CPUs will power Japan's next $750M ...
    Aug 23, 2025 · Japan is investing over $750 million in FugakuNEXT, a zetta-scale supercomputer built by RIKEN and Fujitsu. Powered by FUJITSU-MONAKA3 CPUs ...
  216. [216]
    RIKEN, Japan's Leading Science Institute, Taps Fujitsu and NVIDIA ...
    it's a strategic investment in Japan's future. Backed by Japan's MEXT (Ministry of Education ...
  217. [217]
    Japan plans 1000 times more powerful supercomputer than US ...
    Jun 18, 2025 · Japan is investing over $750 million to develop FugakuNEXT to accelerate AI and scientific research.
  218. [218]
  219. [219]
    Ranked: Top Countries by Computing Power - Visual Capitalist
    Dec 1, 2024 · We visualized data from the latest TOP500 ranking to reveal the top countries by computing power, based on their supercomputing capacity.
  220. [220]
    [PDF] Commerce Implements New Export Controls on Advanced ...
    Oct 7, 2022 · BIS's rule on advanced computing and semiconductor manufacturing addresses U.S. national security and foreign policy concerns in two key areas.
  221. [221]
    Balancing the Ledger: Export Controls on U.S. Chip Technology to ...
    Feb 21, 2024 · The Dutch decision to block exports of ASML's most advanced extreme ultraviolet (EUV) lithography tools should, in principle, foreclose China's ...
  222. [222]
    The Limits of Chip Export Controls in Meeting the China Challenge
    Apr 14, 2025 · The implementation of controls significantly disrupted China's semiconductor ecosystem, causing price spikes for some device types and forcing ...
  223. [223]
    China's secretive Sunway Pro CPU quadruples performance over its ...
    Nov 24, 2023 · China's secretive Sunway Pro CPU quadruples performance over its predecessor, allowing the supercomputer to hit exaflop speeds ; SW26010-Pro, 384 ...
  224. [224]
    What's Inside China's New Homegrown “Tianhe Xingyi ...
    Dec 6, 2023 · China is using a domestic processor as the backbone for double the performance of the Tianhe-2 system, which topped the Top 500 starting in ...
  225. [225]
    China's AI Models Are Closing the Gap—but America's Real ... - RAND
    May 2, 2025 · While Chinese models close the gap on benchmarks, the United States maintains an advantage in total compute capacity—owning far more, and more ...
  226. [226]
    America's AI Lead over China: Here's Why It Will Continue
    Jul 1, 2025 · China controls just 15 percent of global AI compute capacity compared to America's 75 percent. US export controls have made this imbalance worse ...
  227. [227]
    China hit hard by new Dutch export controls on ASML chip-making ...
    Sep 16, 2024 · ASML is barred from shipping to China its most advanced EUV systems, necessary for making chips smaller than 7-nanometres, as well as immersion ...
  228. [228]
    What Is the xAI Supercomputer (Colossus)? | Built In
    Jul 29, 2025 · Built by xAI, Colossus is currently the world's largest supercomputer, located in an industrial park in Tennessee's South Memphis neighborhood.Missing: details | Show results with:details
  229. [229]
    Inside Memphis' Battle Against Elon Musk's xAI Data Center | TIME
    Aug 13, 2025 · The supercomputer, named Colossus, consisted of a staggering 230,000 Nvidia GPUs, a sheer training power that allowed Musk to vault past his ...
  230. [230]
    Data on GPU clusters - Epoch AI
    Private-sector companies own a dominant share of GPU clusters. The private sector's share of global AI computing capacity has grown from 40% in 2019 to 80% in ...
  231. [231]
    NVIDIA Commits US$100 Billion to OpenAI in Landmark AI ...
    Sep 23, 2025 · NVIDIA announced a US$100 billion investment in OpenAI and a partnership to build 10GW of data centers powered with millions of GPUs.
  232. [232]
    NVIDIA DGX Spark
    Delivering the power of an AI supercomputer in a desktop-friendly size, NVIDIA DGX Spark is ideal for AI developer, researcher, and data scientist workloads.GTC 2025 | NVIDIA On-Demand · DGX Station · Omniverse Enterprise Systems
  233. [233]
    NVIDIA DGX Spark Arrives for World's AI Developers
    Oct 13, 2025 · Built on the NVIDIA Grace Blackwell architecture, DGX Spark integrates NVIDIA GPUs, CPUs, networking, CUDA libraries and NVIDIA AI software, ...
  234. [234]
    U.S. Export Controls and China: Advanced Semiconductors
    Sep 19, 2025 · Initial actions tightening controls have included adding 42 PRC entities to the EL in March 2025 and another 23 PRC entities in September 2025; ...
  235. [235]
    Additions to the Entity List - Federal Register
    Mar 28, 2025 · In this rule, the Bureau of Industry and Security (BIS) amends the Export Administration Regulations (EAR) by adding 12 entities to the Entity List.
  236. [236]
    all-press-releases | Bureau of Industry and Security
    27 Chinese entities are added for acquiring or attempting to acquire U.S.-origin items in support of China's military modernization. These entities have ...
  237. [237]
    Did U.S. Semiconductor Export Controls Harm Innovation? - CSIS
    Nov 5, 2024 · A study of 30 leading semiconductor firms finds that recent U.S. export controls aimed at China have not hindered innovation.
  238. [238]
    Trump announces private-sector $500 billion investment in AI ...
    Jan 21, 2025 · US President Donald Trump on Tuesday announced a private sector investment of up to $500 billion to fund infrastructure for artificial intelligence.
  239. [239]
    The Journey to Frontier | ORNL
    Nov 14, 2023 · Today's exascale supercomputer not only keeps running long enough to do the job but at an average of only around 30 megawatts. That's a little ...
  240. [240]
    European Jupiter Supercomputer Inaugurated with Exascale ...
    Sep 8, 2025 · This is important, as Jupiter comes with a price tag of 500 million euros, including six years of operation. The LUMI supercomputer in Finland, ...
  241. [241]
    What We Know about Alice Recoque, Europe's Second Exascale ...
    Jun 24, 2024 · The supercomputer will cost about €544 million. It will be installed at CEA's TGCC supercomputing center at Bruyères-le-Châtel, about 25 miles ...
  242. [242]
    Big tech has spent $155bn on AI this year. It's about to spend ...
    Aug 3, 2025 · Tech giants have spent more on AI than the US government has on education, jobs and social services in 2025 so far.
  243. [243]
    The ROI on HPC? $44 in profit for every $1 in HPC - HPCwire
    Sep 7, 2020 · A study by Hyperion Research finds that high performance computing generates $44 in profit for every dollar of investment in HPC systems.
  244. [244]
    SROI of CSC's high-performance computing services studied
    Apr 3, 2024 · A study by Taloustutkimus found that an investment of €1 into CSC-IT Center for Science's high-performance computing (HPC) services generated a €25-37 benefit ...
  245. [245]
    Frontier: Step By Step, Over Decades, To Exascale - The Next Platform
    May 30, 2022 · While Oak Ridge can deploy up to 100 megawatts for its computing, it costs roughly a dollar per watt per year to do this – so $100 million – and ...
  246. [246]
    NAACP files intent to sue Elon Musk's xAI company over Memphis ...
    17 Jun 2025 · The NAACP filed an intent to sue Elon Musk's artificial intelligence company xAI on Tuesday over concerns about air pollution generated by a supercomputer.
  247. [247]
    Elon Musk's xAI accused of pollution over Memphis supercomputer
    25 Apr 2025 · “It is appalling that xAI would operate more than 30 methane gas turbines without any permits or any public oversight,” said Amanda Garcia, a ...Missing: backlash | Show results with:backlash
  248. [248]
    Efficiency, Power, ...
    Rmax and Rpeak values are in GFlops. For more details about other fields, check the TOP500 description. TOP500 Release. June 2025, November 2024, June 2024 ...Missing: percentage | Show results with:percentage
  249. [249]
    AI's Growing Carbon Footprint - State of the Planet
    Jun 9, 2023 · Because of the energy the world's data centers consume, they account for 2.5 to 3.7 percent of global greenhouse gas emissions, exceeding even ...
  250. [250]
    Combining AI and physics-based simulations to accelerate COVID ...
    Sep 7, 2022 · Researchers from University College London are using ALCF supercomputers and machine learning methods to speed up the search for promising new drugs.Missing: savings | Show results with:savings
  251. [251]
    Artificial intelligence in drug discovery and development - PMC
    AI is used in drug discovery, development, repurposing, clinical trials, and product management, improving the overall life cycle of pharmaceutical products.
  252. [252]
    World's most energy-efficient AI supercomputer comes online - Nature
    Sep 12, 2025 · JUPITER, the European Union's new exascale supercomputer, is 100% powered by renewable energy. Can it compete in the global AI race?
  253. [253]
    Elon Musk's xAI supercomputer stirs turmoil over smog in Memphis
    Sep 11, 2024 · MLGW is adamant that xAI won't impact the grid or water availability. It also says it's in talks with the company to build a gray water plant to ...
  254. [254]
    AI chips are getting hotter. A microfluidics breakthrough goes ...
    Sep 24, 2025 · Researchers say microfluidics could boost efficiency and improve sustainability for next-generation AI chips. Most GPUs operating in today's ...
  255. [255]
    Responding to the climate impact of generative AI | MIT News
    Sep 30, 2025 · MIT experts discuss strategies and innovations aimed at mitigating the amount of greenhouse gas emissions generated by the training, ...
  256. [256]
    Europe's supercomputers hijacked by attackers for crypto mining
    May 18, 2020 · At least a dozen supercomputers across Europe have shut down after cyber-attacks tried to take control of them.
  257. [257]
    Security incident knocks Archer supercomputer service offline for days
    May 14, 2020 · Security incident knocks UK supercomputer service offline for days. Scientists use the service to model climate change, coronavirus, and other ...
  258. [258]
    Significant Cyber Incidents | Strategic Technologies Program - CSIS
    October 2024: Chinese hackers hacked cellphones used by senior members of the Trump-Vance presidential campaign, including phones used by former President ...
  259. [259]
    DOD Introduces New Supercomputer Focused on Biodefense ...
    Aug 15, 2024 · The biodefense-focused system will provide unique capabilities for large-scale simulation and AI-based modeling for a variety of defensive ...
  260. [260]
    DOD unveils new biodefense-focused supercomputer - Nextgov/FCW
    Aug 16, 2024 · The Department of Defense and National Nuclear Security Administration have a new supercomputing system focused on biological defense at the Lawrence Livermore ...
  261. [261]
    The Ethics of Acquiring Disruptive Military Technologies
    Jan 27, 2020 · A framework for assessing the moral effect, necessity, and proportionality of disruptive technologies to determine whether and how they should be developed.
  262. [262]
  263. [263]
    The Case Against Google's Claims of “Quantum Supremacy”: A Very ...
    Dec 9, 2024 · Thus, from the quantum supremacy point of view, Sycamore's role in the race between classical and quantum computers has largely been eclipsed by ...<|control11|><|separator|>
  264. [264]
    Frontier supercomputer debuts as world's fastest, breaking exascale ...
    May 30, 2022 · Frontier features a theoretical peak performance of 2 exaflops, or two quintillion calculations per second, making it ten times more powerful ...
  265. [265]
    Celebrating one year of achieving exascale with Frontier, world's ...
    May 22, 2023 · Frontier is the world's first and fastest exascale supercomputer, built for the U.S. Department of Energy, and is faster than the next four ...
  266. [266]
    El Capitan Takes Exascale Computing to New Heights - AMD
    Jan 10, 2025 · Both El Capitan and Frontier sit under the umbrella of the US Department of Energy (DOE). Housed at Lawrence Livermore National Laboratory ...
  267. [267]
    El Capitan retains No. 1 supercomputer ranking - Network World
    Jun 10, 2025 · The El Capitan system at Lawrence Livermore National Laboratory in California maintained its title as the world's fastest supercomputer.<|separator|>
  268. [268]
    NYU Unveils 'Torch'—The Most Powerful Supercomputer in New ...
    Oct 9, 2025 · Named for the University's iconic logo, Torch is five times more powerful than NYU's current supercomputer, Greene, with the capability to do ...
  269. [269]
    Lincoln Lab unveils the most powerful AI supercomputer at any US ...
    Oct 2, 2025 · MIT Lincoln Laboratory's newest supercomputer is the most powerful AI system at a U.S. university. Equipped for generative AI applications, ...
  270. [270]
    NVIDIA Puts Grace Blackwell on Every Desk and at Every AI ...
    Jan 6, 2025 · Project DIGITS features the new NVIDIA GB10 Grace Blackwell Superchip, offering a petaflop of AI computing performance for prototyping, fine-tuning and running ...
  271. [271]
    NVIDIA Sweeps New Ranking of World's Most Energy-Efficient ...
    May 21, 2024 · In the latest Green500 ranking of the most energy-efficient supercomputers, NVIDIA-powered systems swept the top three spots.
  272. [272]
    Eviden's Supercomputers Ranked #1 and #2 for Energy Efficiency ...
    Nov 19, 2024 · Eviden's JEDI module is ranked #1 and ROMEO 2025 is #2 on the Green500 list for energy efficiency. Eviden also has a 6th place ranking.
  273. [273]
    Japan Announces Plans for a Zetta-Scale Supercomputer by 2030
    Sep 12, 2024 · Japan Announces Plans for a Zetta-Scale Supercomputer by 2030. It aims to be 1,000 times more powerful than the AMD-powered Frontier exascale ...
  274. [274]
    Forget Zettascale, Trouble is Brewing in Scaling Exascale ... - HPCwire
    Nov 14, 2023 · In 2021, Intel famously declared its goal to get to zettascale supercomputing by 2027, or scaling today's Exascale computers by 1,000 times.
  275. [275]
    US's DOE Details the Next Major Supercomputer - HPCwire
    Jan 13, 2025 · US's DOE Details the Next Major Supercomputer; A Companion to El Capitan ... The network bandwidth could be a mix of Ethernet and Infiniband.
  276. [276]
    Dennard's Law - Semiconductor Engineering
    Dennard's Law states that as the dimensions of a device go down, so does power consumption. While this held, smaller transistors ran faster, used less power ...
  277. [277]
    Getting To Zettascale Without Needing Multiple Nuclear Power Plants
    Mar 3, 2023 · The crux of the challenge will be energy efficiency. While the performance of datacenter servers is doubling every 2.4 years, HPC computing every 1.2 years, ...
  278. [278]
    From Exascale, towards building Zettascale general purpose & AI ...
    May 17, 2023 · The projected supercomputer in 2035 that will deliver Zettascale performance would consume 500 megawatts of power at an energy efficiency of 2140 GFlops/watt.
  279. [279]
    Moving from exascale to zettascale computing: challenges and ...
    In this study, we discuss the challenges of enabling zettascale computing with respect to both hardware and software. We then present a perspective of future ...
  280. [280]
    IBM and AMD to Develop Quantum-Centric Supercomputing
    Sep 4, 2025 · This hybrid approach is a pragmatic response to the current state of the technology. The industry is in the “Noisy Intermediate-Scale Quantum” ( ...
  281. [281]
    IBM and AMD Announce Strategic Partnership to Develop Hybrid ...
    Aug 26, 2025 · IBM and AMD have announced a strategic collaboration aimed at building hybrid supercomputing systems that combine quantum and classical ...Missing: NISQ | Show results with:NISQ
  282. [282]
    Building software for quantum-centric supercomputing - IBM
    Sep 15, 2025 · Explore open-source tools that IBM and its partners are creating to enable seamless integrations of quantum and classical high-performance ...Missing: NISQ | Show results with:NISQ
  283. [283]
    IBM and RIKEN Unveil First IBM Quantum System Two Outside of ...
    Jun 24, 2025 · IBM and RIKEN have launched the first IBM Quantum System Two outside the U.S., co-located with the Fugaku supercomputer in Japan.Missing: NISQ | Show results with:NISQ
  284. [284]
    Superconducting quantum computers: who is leading the future?
    Aug 19, 2025 · IBM Quantum has introduced the IBM Condor, an innovative quantum processor featuring 1121 superconducting qubits and leveraging IBM's state-of- ...
  285. [285]
  286. [286]
  287. [287]
  288. [288]
  289. [289]
    Intel Builds World's Largest Neuromorphic System to Enable More ...
    Apr 17, 2024 · Hala Point, the industry's first 1.15 billion neuron neuromorphic system, builds a path toward more efficient and scalable AI.
  290. [290]
    Neuromorphic Computing and Engineering with AI | Intel®
    Research using Loihi 2 processors has demonstrated orders of magnitude gains in the efficiency, speed, and adaptability of small-scale edge workloads.Missing: supercomputers | Show results with:supercomputers
  291. [291]
    Neuromorphic Computing - An Overview - arXiv
    Oct 17, 2025 · Loihi, being specialized for specific SNNs, uses a network of physical artificial neurons and synapses, which are connected in a manner that is ...
  292. [292]
    What Is Quantum Optimization? Research Team Offers Overview of ...
    Nov 18, 2024 · Quantum optimization algorithms offer new approaches that might streamline computations, improve accuracy, and even reduce energy costs.
  293. [293]
    The neurobench framework for benchmarking neuromorphic ...
    Feb 11, 2025 · Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles.