Fact-checked by Grok 2 weeks ago

High-performance computing

High-performance computing (HPC), also known as supercomputing, refers to the aggregation of computing power from multiple processors or interconnected computers to perform complex calculations and simulations at speeds far exceeding those of standard desktop or workstation systems. This approach leverages techniques, where tasks are divided across numerous processing units to solve large-scale problems in fields such as science, , and that would otherwise be computationally infeasible or excessively time-consuming. The history of HPC traces back to the mid-20th century, driven by the need for advanced computational capabilities in military and scientific applications. Early milestones include the ENIAC (1945), the first general-purpose electronic computer, which marked the beginning of programmable computing for complex calculations. The 1960s saw the emergence of dedicated supercomputers like the CDC 6600 (1964), which achieved speeds of up to 3 MFLOPS and is often credited as the first true supercomputer. The 1970s and 1980s were dominated by vector-processing machines from Cray Research, such as the Cray-1 (1976) at 160 MFLOPS, which introduced innovative cooling and architecture to push performance boundaries. By the 1990s, the shift to massively parallel processing and affordable clusters, exemplified by the Beowulf project (1994) using off-the-shelf PCs to achieve 1 GFLOP/s for under $50,000, democratized access to high performance. Moore's Law, positing that transistor density doubles approximately every two years, fueled this evolution until physical limits on clock speeds led to multi-core processors and hybrid systems integrating GPUs in the 2000s. Today, systems like El Capitan (2024) at Lawrence Livermore National Laboratory exceed 1.7 exaFLOPS as of June 2025, ranking atop the TOP500 list of the world's fastest supercomputers, marking the exascale era with multiple systems surpassing 1 exaFLOPS. Key technologies in HPC include clusters of interconnected nodes, each comprising multi-core CPUs, high-bandwidth memory, and accelerators like GPUs (e.g., series with thousands of CUDA cores) or processors, enabling massive parallelism. High-speed interconnects such as facilitate low-latency communication between nodes, while scalable storage systems handle petabytes of data. Software ecosystems, including parallel programming models like () and , optimize workload distribution, and benchmarks like HPL (High-Performance Linpack) measure system performance in floating-point operations per second (). Recent advancements incorporate machine learning accelerators and energy-efficient architectures to address power consumption challenges, with modern systems drawing tens of megawatts. HPC plays a pivotal role in addressing across disciplines, enabling breakthroughs that would be impossible with conventional computing. In scientific research, it powers modeling, simulations, and seismic analysis to predict environmental changes and natural disasters. Engineering applications include aerodynamic design, , and automotive crash simulations, accelerating innovation in industries like and . In healthcare and bioinformatics, HPC facilitates , genomic sequencing, and by processing vast datasets rapidly. Emerging uses in and analytics further amplify its impact, supporting training on exabyte-scale information for fields like and sciences. Overall, HPC's and speed reduce computation times from years to days, fostering informed and through enhanced research productivity.

Fundamentals

Definition and Scope

High-performance computing (HPC) refers to the aggregation of computational resources to perform advanced calculations and simulations that exceed the capabilities of standard desktop or server systems, enabling the solution of complex scientific and engineering problems. This involves leveraging multiple processors or nodes working together to achieve significantly higher performance levels, often measured in . The scope of HPC encompasses supercomputing systems, distributed clusters, and frameworks designed for large-scale and modeling, such as those used in simulations or . These systems prioritize sustained high-speed computation for resource-intensive tasks that require massive parallelism, distinguishing HPC from conventional computing by its focus on and in handling multidimensional datasets. HPC differs from (HTC), which emphasizes queuing and executing numerous independent tasks over extended periods to maximize overall resource utilization, whereas HPC targets peak performance for tightly coupled, interactive workloads that demand rapid data exchange between components. In HTC, the goal is long-term productivity through , while HPC seeks immediate, high-intensity bursts of computation. The term "high-performance computing" emerged in the as a broader descriptor replacing "supercomputing," reflecting the shift toward accessible cluster-based systems rather than specialized monolithic machines, driven by advances in commodity and networking. Examples of HPC problem scales include , which operates at approximately 10^15 to model phenomena like weather patterns or , and , operating at 10^18 to enable finer-grained simulations such as full-system climate modeling or at atomic levels. Petascale systems represent a foundational milestone achieved in the mid-2000s, while exascale provides a thousandfold increase in capability for unprecedented accuracy in predictive modeling as demonstrated since 2022.

Key Concepts

High-performance computing (HPC) relies on paradigms to distribute workloads across multiple , enabling efficient execution of complex simulations and data analyses. Data parallelism involves dividing a large into smaller subsets, with each performing the same operation on its assigned portion simultaneously, which is particularly effective for tasks like matrix multiplications or image processing. In contrast, task parallelism decomposes a program into independent subtasks that can execute concurrently on different , allowing diverse operations such as one while analyzing another. Combinations of these, known as hybrid parallelism, integrate data and task approaches, often using for inter-node coordination and within nodes to optimize resource utilization in environments. Scalability in HPC refers to a system's ability to maintain or improve performance as computational resources, such as the number of processors, increase. provides a theoretical limit on for fixed-size problems, stating that the maximum speedup S achievable with s processors is given by the equation: S = \frac{1}{(1 - p) + \frac{p}{s}} where p is the fraction of the program that can be parallelized, and $1 - p is the inherently serial portion that remains a regardless of added processors. This law highlights that even small serial fractions can severely cap overall gains, emphasizing the need to minimize sequential code in HPC applications. As a counterpoint, addresses scaled problems where workload expands with available resources, proposing that speedup S with s processors is: S = s p + (1 - p) where p is now the parallelizable fraction adjusted for the scaled problem size, allowing near-linear speedups since serial components grow proportionally slower than parallel ones. This perspective is more applicable to many HPC scenarios, such as climate modeling, where larger datasets leverage additional processors effectively. Memory hierarchies in HPC optimize data access by organizing storage across levels with varying speed, capacity, and cost, from fast but small caches to slower, larger main memory. Cache memory forms the lowest levels, typically including L1 (on-core, kilobytes, sub-nanosecond access), L2 (per-core or shared, megabytes, few nanoseconds), and L3 (shared across cores, tens of megabytes, tens of nanoseconds), which temporarily hold frequently used data to reduce latency from main memory fetches. In shared memory models, all processors access a unified address space, facilitating easy data sharing but limited to single nodes due to hardware constraints like bus contention. Conversely, distributed memory models assign private memory to each processor or node, requiring explicit communication via message passing for data exchange, which scales better for large clusters but introduces overhead from network latency. A fundamental metric for evaluating HPC performance is (floating-point operations per second), quantifying the rate of arithmetic computations essential for scientific simulations. Tiers include teraFLOPS (10^{12} operations per second) for mid-range systems, petaFLOPS (10^{15}) for leading supercomputers until the , and exaFLOPS (10^{18}) for exascale capabilities as achieved since , providing a standardized measure of sustained computational power.

History

Early Developments

The origins of high-performance computing (HPC) trace back to the mid-20th century, when the demand for rapid numerical calculations during spurred the development of the first programmable electronic computers. The (Electronic Numerical Integrator and Computer), completed in 1945 at the , represented a foundational milestone as the first general-purpose electronic digital computer designed primarily for calculations to support U.S. Army efforts. Funded by the U.S. Army Ordnance Department amid the escalating needs of the war, utilized approximately 18,000 s to achieve speeds of up to 5,000 additions per second, dramatically reducing computation times from hours to seconds for complex problems. However, its reliance on vacuum tube technology introduced significant challenges, including high power consumption of 150 kW, which generated substantial heat requiring massive cooling systems weighing several tons, and frequent tube failures that compromised reliability, with often measured in hours. The transition to the and marked the shift toward more specialized architectures optimized for scientific computing, driven by imperatives in nuclear research and . Vector processors emerged as a key innovation, enabling efficient handling of arrays of data for high-speed numerical operations. Seymour Cray's design of the , released by in 1964, is widely regarded as the first , achieving a peak performance of 3 MFLOPS through its innovative use of multiple functional units and transistor-based logic that superseded vacuum tubes. This machine addressed some reliability issues of earlier systems by leveraging transistors, which were more durable and power-efficient, though cooling remained a persistent challenge due to the heat from densely packed components operating at high speeds. U.S. government agencies played a pivotal role in these advancements; the Atomic Energy Commission (AEC, predecessor to the Department of Energy or DOE) funded supercomputing at national laboratories like for nuclear simulations, while supported vector processor developments for aerospace modeling during the . By the 1970s, the focus evolved toward to overcome the limitations of single-processor designs, laying groundwork for scalable HPC systems. The , operational in 1972 at the University of , pioneered this approach with its array of 64 processors arranged in a single-instruction, multiple-data (SIMD) configuration, enabling simultaneous operations on large datasets for applications like simulations. Funded primarily by the Advanced Research Projects Agency () as part of broader efforts to enhance computational capabilities for defense and scientific research, the demonstrated the potential of architectures but highlighted ongoing challenges in and interconnect reliability, compounded by the thermal management demands of its transistor arrays. These early systems, supported by federal initiatives from the , , and military agencies, established the conceptual and technological foundations for HPC, emphasizing the interplay between hardware innovation, government investment, and the imperative to solve computationally intensive problems in a era of geopolitical tension.

Major Milestones

The project, launched in 1993, established a biannual ranking of the world's 500 most powerful supercomputers based on their performance in the High-Performance Linpack benchmark, providing a standardized metric to track advancements in high-performance computing (HPC) capabilities. This initiative, initiated by researchers at the and the , marked a pivotal milestone by fostering global competition and enabling systematic analysis of HPC trends, with the first list published in June 1993 featuring the Thinking Machines CM-5 as the top system. In 1996, the U.S. Department of Energy (DOE) initiated the Accelerated Strategic Computing Initiative (ASCI), a program aimed at developing advanced simulation capabilities to support stockpile stewardship in the absence of nuclear testing, which accelerated the push toward terascale computing. The ASCI effort involved collaborations among national laboratories and industry partners, culminating in systems like ASCI White, which by 2000 achieved terascale performance exceeding 7 teraFLOPS, enabling complex multidimensional simulations previously infeasible. The introduction of the list in November 2007 represented a significant shift toward in HPC, ranking supercomputers by to complement raw computational power metrics and encourage sustainable designs. Developed by researchers at , the list highlighted the growing power consumption challenges in scaling HPC systems, with the inaugural edition showing efficiencies up to 357 megaFLOPS per watt, prompting innovations in hardware and cooling technologies. The transition to petascale computing was epitomized by IBM's Roadrunner supercomputer, deployed at Los Alamos National Laboratory in 2008, which became the first to sustain over 1 petaFLOP on the Linpack benchmark at 1.026 petaFLOPS, with a peak of 1.7 petaFLOPS. This hybrid architecture, combining AMD Opteron processors and IBM Cell Broadband Engine chips across 12,640 nodes, demonstrated scalable heterogeneous computing for scientific applications like climate modeling and nuclear simulations. Internationally, Japan's Fugaku supercomputer, developed by RIKEN and Fujitsu and operational from 2020, achieved 442 petaFLOPS on the TOP500 list while pioneering integration of HPC with artificial intelligence workloads through dedicated AI accelerators and software frameworks. Fugaku's Arm-based A64FX processors enabled hybrid simulations, such as drug discovery and earthquake modeling combined with machine learning, marking a milestone in versatile, high-impact computing for multidisciplinary research. Exascale computing efforts reached an initial milestone with the Frontier supercomputer at Oak Ridge National Laboratory in 2022, the first to exceed 1 exaFLOP on the Linpack benchmark at 1.102 exaFLOPS, powered by AMD EPYC CPUs and Instinct MI250X GPUs in an HPE Cray EX system. This DOE-funded machine, comprising over 9,400 compute nodes, advanced capabilities in areas like fusion energy research and materials science, while emphasizing power efficiency with 52.7 gigaFLOPS per watt on the Green500 list as of June 2022. Subsequent systems continued the exascale era, including Aurora at Argonne National Laboratory (2023, 1.012 exaFLOPS) and El Capitan at Lawrence Livermore National Laboratory (2024, 1.742 exaFLOPS), the latter ranking as the world's fastest as of June 2025.

Architectures and Technologies

Hardware Components

High-performance computing (HPC) systems are built from specialized hardware components optimized for , massive throughput, and under extreme workloads. These include advanced processors for , high-speed interconnects for communication, high-bandwidth and for access, cooling solutions to manage loads, and scalable architectures that form the system's backbone. Processors are the primary computational engines in HPC nodes, with multi-core central processing units (CPUs) providing versatile general-purpose performance. Intel Xeon and processors dominate this space, featuring dozens to hundreds of cores per chip to handle parallel tasks in simulations and ; for example, the 9005 series offers up to 192 cores with thermal design powers ranging from 155 to 500 W, enabling efficient scaling in large clusters. Graphics processing units (GPUs) excel in highly parallel operations like tensor computations, with the Tensor Core GPU delivering up to 67 TFLOPS in single-precision floating-point performance and 3.35 TB/s through its HBM3, achieving up to 3 times the throughput of the A100 for HPC workloads. Accelerators such as field-programmable gate arrays (FPGAs) provide reconfigurable hardware for custom algorithms, with VP1902 FPGAs demonstrating up to 2.5 TFLOPS peak floating-point performance in scientific applications due to their adaptable logic blocks. Interconnects enable low-latency data exchange across thousands of nodes, critical for . High-speed networks like NVIDIA's support bandwidths up to 800 Gbps per port with the XDR generation as of 2025, incorporating in-network computing engines to offload tasks like operations and reduce CPU overhead in supercomputers. Ethernet variants, such as 400 GbE, serve as alternatives for cost-sensitive deployments, though 's lower latency makes it preferable for tightly synchronized HPC tasks. Memory systems in HPC prioritize bandwidth to feed processors without bottlenecks. High-bandwidth memory (HBM) uses 3D-stacked to deliver over 3 TB/s throughput with HBM3E, as seen in GPU integrations where it supports simultaneous access from multiple cores in memory-intensive simulations. For storage, express (NVMe) solid-state drives leverage PCIe interfaces for high input/output operations per second (), providing up to 10 million IOPS per server in parallel setups to accelerate data loading in large-scale computations. Cooling technologies are essential to sustain performance amid rising power demands. Modern HPC racks often exceed 20 kW, with AI workloads reaching up to 120 kW per rack as of 2025, necessitating advanced methods like liquid , which submerges components in non-conductive fluids for efficient heat dissipation, and direct-to-chip liquid cooling, which targets hotspots to handle densities over 350 W/cm². These solutions maintain operational temperatures below critical thresholds, preventing thermal throttling in dense deployments. Node designs determine system and . HPC clusters comprise loosely coupled servers interconnected via , allowing modular and cost-effective to exascale levels through off-the-shelf components. In contrast, massively parallel processing () systems use tightly coupled nodes with hierarchies and custom interconnects, optimizing for high-throughput workloads like weather modeling where low inter-node is paramount.

Software and Programming Models

High-performance computing (HPC) relies on specialized software ecosystems to exploit the capabilities of parallel hardware architectures, enabling efficient execution of computationally intensive tasks across distributed and systems. These software components include programming models, libraries, systems, operating systems, and development tools, all designed to optimize , portability, and in environments. Parallel programming models form the foundation for developing HPC applications, with the (MPI) serving as the for distributed-memory systems since its initial specification in 1994. MPI provides a portable for point-to-point communication, collective operations, and process management, allowing programs to coordinate across multiple nodes in a . The current version, MPI-5.0 released in 2025, extends support for advanced features like persistent communication and one-sided operations to enhance efficiency on modern interconnects. For shared-memory parallelism within a single node, offers a directive-based approach that simplifies multi-threading in C, C++, and codes without explicit thread management. , standardized by the OpenMP Architecture Review Board, enables scalable loop-level and task-based parallelism, with version 6.0 (2024) introducing support for accelerators like GPUs. Key libraries abstract low-level hardware operations, facilitating high-performance numerical computations. The (BLAS) provide optimized routines for vector and matrix operations, forming a foundational layer for many scientific algorithms. Built upon BLAS, the (LAPACK) delivers high-level solvers for systems of linear equations, eigenvalue problems, and singular value decompositions, ensuring and efficiency in dense linear algebra tasks. For GPU-accelerated computing, NVIDIA's Compute Unified Device Architecture () enables direct programming of graphics processing units through a C/C++-like extension, supporting kernel launches, memory management, and thread hierarchies to achieve massive parallelism. Job scheduling systems manage resource allocation and workload distribution in HPC clusters to maximize utilization and minimize wait times. , an open-source solution widely adopted since 2003, handles job queuing, dependency resolution, and fault-tolerant scheduling for Linux-based clusters of varying scales. Similarly, the (PBS), originating from NASA's 1990s implementation and now available as open-source OpenPBS, supports batch job submission, priority-based queuing, and integration with diverse hardware configurations. Optimized operating systems underpin HPC deployments, with distributions tailored for cluster environments. , based on , automates the installation and configuration of compute nodes, frontend servers, and networking for rapid cluster provisioning. and optimization tools are essential for identifying bottlenecks and ensuring reliability in codes. , a dynamic instrumentation framework, detects memory leaks, invalid accesses, and threading errors at runtime, supporting multiple architectures including x86 and . The toolkit provides portable profiling and tracing for programs in languages like , , and , capturing metrics such as execution time, hardware counters, and I/O events to guide .

Performance Evaluation

TOP500 Project

The TOP500 project was launched in 1993 by Hans Meuer of the , Erich Strohmaier, and of the , Knoxville, as an initiative to track and analyze trends in high-performance computing systems. The first list was presented at the International Supercomputing Conference (ISC) in , compiling performance data from 116 supercomputers based on voluntary submissions from owners and vendors. This effort aimed to provide a standardized, reliable snapshot of global HPC capabilities, evolving into a biannual ranking of the 500 most powerful non-distributed computer systems worldwide. The project's methodology centers on the High-Performance Linpack (HPL) , a standardized test that ranks systems by their sustained (FLOPS) in solving a dense using . HPL measures Rmax, the achieved performance on randomly generated matrices, expressed in teraflops (TFLOPS), petaflops (PFLOPS), or exaflops (EFLOPS), rather than theoretical peak performance (Rpeak). Submissions must include detailed hardware configurations, installation dates, and results validated against official HPL rules to ensure comparability, with the full list published alongside statistical analyses of architectural trends, types, and interconnects. Updates occur twice annually—in at the ISC High Performance in , , and in at the (Supercomputing) in the United States—allowing the list to reflect rapid advancements in HPC technology. Each edition includes not only the ranked systems but also aggregated data on growth, market shares by vendor and country, and emerging metrics like , which have become increasingly relevant as systems scale. As of the most recent list ( 2025), retains the No. 1 position; the next list is scheduled for 2025 at SC25. The TOP500 has profoundly influenced the HPC ecosystem by spurring international competition among governments, research institutions, and manufacturers, leading to accelerated innovation in processor architectures and system designs. It has illuminated long-term trends, such as the in computational alongside rising power demands, with top systems evolving from approximately 850 kW consumption in (e.g., the inaugural No. 1 CM-5/1024) to multi-megawatt scales today, as exemplified by El Capitan's 30+ MW draw. However, critics argue that its focus on HPL performance overemphasizes raw speed at the expense of real-world efficiency, since Linpack results often represent an optimistic upper bound compared to diverse application workloads.

Other Benchmarks and Metrics

Beyond the TOP500's focus on peak floating-point performance via the LINPACK benchmark, several other metrics evaluate high-performance computing (HPC) systems in terms of , memory-bound operations, (I/O) capabilities, and application-specific workloads. These benchmarks address limitations in dense linear algebra tests by emphasizing real-world patterns such as sparse computations, data movement, and across diverse scientific domains. The list ranks supercomputers by , measuring performance in gigaflops per watt (GFlops/W) to highlight power consumption alongside computational capability. Launched in as a complement to the , it promotes sustainable HPC designs by incentivizing low-power architectures, such as those using processors or advanced cooling. As of the November 2024 list, the top system achieved approximately 65 GFlops/W; as of the June 2025 list, the top system (, the first module of ) achieved 72.733 GFlops/W and remains No. 1. Trends show continued progress in for exascale systems, with averaging ~11 MW power consumption (up to 17 MW maximum). The High-Performance Conjugate (HPCG) benchmark assesses systems on sparse matrix operations, which are prevalent in real-world simulations like , climate modeling, and electromagnetics. Developed by Michael Heroux at and released in 2013, HPCG uses a multigrid-preconditioned conjugate solver involving -vector multiplications, vector updates, and global reductions, typically yielding 5-10% of LINPACK performance due to its memory-intensive nature. It ranks systems biannually alongside results, with top scores as of June 2025 reaching 17.4 petaflops on ( at 14.05 petaflops), underscoring the gap between theoretical peak and practical efficiency. I/O benchmarks are crucial for petabyte-scale HPC environments, where data throughput can bottleneck computations. The IOR (Interleaved or Random) tool evaluates parallel bandwidth and latency using MPI-IO or interfaces, simulating workloads like large-block writes or shared-file access to measure sustained throughput in gigabytes per second. Complementing IOR, mdtest focuses on operations such as file creation, deletion, and stat calls, testing under high-concurrency scenarios on parallel storage systems like Lustre or GPFS. In large-scale tests, top systems achieve over 100 /s aggregate , but variability arises from file striping and contention. Application-specific benchmarks provide targeted insights into domain performance. The NAS Parallel Benchmarks (NPB), developed by in the early 1990s, derive from (CFD) kernels and pseudo-applications, evaluating efficiency through metrics like iterations per second on problems involving sort (IS), conjugate (), and multigrid () solvers. NPB suites scale from small (Class S) to massive (Class F) problems, revealing communication overheads in MPI or implementations. Similarly, the Graph500 benchmark targets graph analytics, measuring traversed edges per second in (BFS) on synthetic scale-free graphs up to billions of vertices, as seen in Fugaku's top ranking as of November 2024 of 204.068 tera-traversed edges per second (TEPS) for applications in social networks and bioinformatics. Additional metrics emphasize holistic system value. Time-to-solution (TTS) quantifies the total duration to obtain a scientifically valid result, encompassing development, execution, and post-processing, rather than isolated ; for instance, optimizing TTS can reduce overall timelines by integrating hardware-software co-design. Cost-effectiveness metrics, such as dollars per teraflop-hour, assess economic viability by dividing acquisition and operational costs (including energy) by delivered performance, with cloud-based HPC often achieving under $0.01 per teraflop in 2025 deployments compared to on-premises systems. These approaches prioritize practical impact over raw speed.

Applications

Scientific and Research Domains

High-performance computing (HPC) plays a pivotal role in advancing fundamental scientific research across diverse domains, enabling simulations that capture complex phenomena at scales unattainable by traditional methods. In climate modeling, general circulation models (GCMs) and Earth system models (ESMs) such as CESM2 and integrate atmospheric, oceanic, and biogeochemical processes to generate projections for (IPCC) assessments. These models demand to achieve high resolutions—targeting 1 km globally or finer regionally—while running large ensembles of 100 or more members to quantify uncertainties in phenomena like , sea-level rise, and . For instance, CMIP6 simulations under CESM2 require approximately 260 million core-hours to produce over 25,000 simulated years, handling data volumes that escalate to exabytes due to high-resolution outputs and observational integrations from facilities like . Recent like have further accelerated these simulations, enabling higher-fidelity Earth system modeling as of 2025. In , HPC facilitates N-body simulations that model the gravitational dynamics of and baryonic matter, elucidating galaxy formation and the large-scale structure of the . A seminal example is the Millennium Simulation of 2005, which tracked the evolution of 10 billion particles in a cubic volume spanning 2.23 billion light-years, from z=127 to the present day, on supercomputers at the Society's facility in , . This effort, involving over 343,000 CPU-hours, revealed hierarchical structure growth and , providing constraints on and linking galaxy properties to cosmic clustering. Bioinformatics leverages HPC for processing vast genomic datasets and predicting biomolecular structures, accelerating discoveries in biology and medicine. Since 2020, AlphaFold, developed by DeepMind, has utilized powerful computing clusters to predict the 3D structures of over 200 million proteins with near-atomic accuracy, transforming protein folding research that previously required years of experimentation. These predictions, generated in minutes per structure, support genome sequencing analyses by elucidating protein functions and interactions, with applications in drug design and disease modeling; the AlphaFold Protein Structure Database now aids over 2 million researchers worldwide. As of 2024, AlphaFold 3 extends these capabilities to predict interactions with DNA, RNA, and ligands. Particle physics relies on HPC for lattice quantum chromodynamics (QCD) calculations, which simulate quark-gluon interactions to probe the strong force underlying atomic nuclei. Lattice QCD formulations discretize spacetime to compute non-perturbative effects, essential for interpreting data from experiments like the (LHC) at . These simulations, performed on exascale prototypes, validate predictions for masses and scattering processes, with algorithmic advances achieving over 50-fold speedups since 2016 to handle physical volumes approaching continuum limits. Despite these advances, HPC in scientific simulations faces significant challenges, particularly in data visualization and (UQ). Visualization tools must render petabyte-scale outputs from multiphysics models into interpretable forms, often employing in-situ techniques to process data during and avoid I/O bottlenecks. UQ, crucial for assessing model reliability in and , involves propagating uncertainties through high-dimensional parameter spaces, requiring efficient parallel algorithms on exascale systems to generate statistically robust ensembles without prohibitive computational costs.

Engineering and Commercial Uses

High-performance computing (HPC) plays a pivotal role in and sectors by enabling complex simulations and optimizations that drive and cost efficiencies in profit-oriented applications. In industries such as , automotive, , oil and gas, and pharmaceuticals, HPC systems process vast datasets to model real-world scenarios, accelerating design cycles and while minimizing physical prototyping expenses. In , HPC is essential for (CFD) simulations that predict aircraft aerodynamics and structural integrity during design phases. For instance, utilizes HPC clusters to perform detailed CFD analyses for the 777X aircraft, simulating airflow over wings and fuselages to optimize , achieving up to 20% improvement in fuel use and emissions compared to predecessors through advanced wing designs that reduce drag. These simulations, running on thousands of cores, allow engineers to iterate designs virtually, cutting development time from years to months and avoiding costly tests. The automotive industry leverages HPC for crash simulations and training machine learning models for autonomous vehicles, enhancing safety and performance predictions. Carmakers like and employ finite element analysis on s to model vehicle collisions, evaluating material behaviors under extreme forces to refine and occupant protection systems. A prominent example is Tesla's (operational 2023–2025), which processed petabytes of video data from its vehicle fleet to train neural networks for self-driving capabilities, achieving real-time inference speeds necessary for production deployment before its discontinuation in 2025 to focus on newer AI infrastructure. In , HPC facilitates simulations for and , handling billions of probabilistic scenarios to forecast market volatilities and comply with regulatory requirements. Major banks such as run these simulations on dedicated HPC infrastructure to value complex derivatives and stress-test assets against economic shocks, thereby improving capital allocation decisions. This capability has become standard since the , with HPC reducing computation times from days to hours for value-at-risk calculations. Oil and gas exploration relies on HPC for seismic imaging and reservoir modeling, which interpret underground structures to locate hydrocarbon deposits efficiently. Companies like ExxonMobil use reverse time migration algorithms on supercomputers to process terabytes of seismic data, generating high-resolution 3D images that identify viable drilling sites with greater accuracy. Such applications have reduced exploration costs by 30-50% through fewer dry wells and optimized recovery strategies, as evidenced by case studies from the 2010s onward. Pharmaceutical drug discovery employs HPC for molecular dynamics simulations that screen potential compounds by modeling protein-ligand interactions at atomic scales. Firms like accelerate lead identification using GPU-accelerated HPC systems, simulating billions of molecular configurations to predict binding affinities and efficacy, a process that has shortened discovery timelines from 5-7 years to under 3 years in some pipelines since the . This approach, powered by tools like on clusters, has contributed to breakthroughs in targeted therapies by enabling virtual of vast chemical libraries.

Modern Implementations

HPC in the Cloud

High-performance computing (HPC) in the refers to the provision of scalable computational resources over the , allowing users to access powerful clusters without owning physical . This model has gained prominence as cloud providers integrate HPC capabilities into their platforms, enabling rapid deployment of for complex simulations and . Since the , major expansions in cloud have made HPC accessible to a broader range of organizations, shifting from traditional on-premises systems to virtualized environments that support bursty workloads and global collaboration. Key providers dominate the cloud HPC landscape. (AWS) offers HPC clusters through services like AWS Batch and ParallelCluster, which automate the setup of environments for tasks such as and weather modeling. provides Azure Batch, a managed service for running large-scale parallel and HPC jobs, supporting containerized workloads and integration with on-premises systems. Google Cloud delivers the Cluster Toolkit (formerly Cloud HPC Toolkit), an open-source tool launched in 2022 to simplify the deployment of turnkey HPC clusters using best practices for networking and storage. These offerings, building on infrastructure expansions since the early , have democratized access to exascale-level performance for non-specialist users. Cloud HPC provides several advantages over dedicated . On-demand scaling allows resources to expand or contract dynamically, accommodating variable workloads without overprovisioning. The pay-as-you-go pricing model eliminates upfront investments, converting capital expenditures (CapEx) to operational expenses (OpEx) and potentially reducing costs by up to 90% through spot instances for non-critical jobs. Global accessibility further enhances , as teams can access compute from any location with connectivity, fostering faster in . Despite these benefits, cloud HPC faces notable challenges. While early cloud networks were limited to 100 Gbps Ethernet, as of , providers offer up to 400 Gbps with low-latency options like RDMA, though still potentially higher delays compared to on-premises interconnects, which achieve sub-microsecond latencies essential for tightly coupled simulations. Security concerns also arise, particularly for sensitive simulations involving proprietary or classified data, requiring robust encryption, compliance with standards like NIST SP 800-223, and to mitigate risks of breaches in shared environments. Hybrid models address some limitations by combining on-premises and cloud resources. Cloud bursting enables organizations to handle peak loads by seamlessly extending local clusters to the cloud, maintaining low-latency operations for core tasks while offloading surges. For instance, supports hybrid bursting through integration with existing HPC systems, allowing automatic resource allocation during high-demand periods. A prominent illustrates these capabilities: NASA's Center for Climate Simulation utilized AWS through the Science Managed Cloud Environment (SMCE) for large-scale climate modeling and data analytics, leveraging bursting to accelerate processing for projects. This approach reduced project durations and achieved substantial cost efficiencies, aligning with broader reports of up to 90% savings in HPC workloads via optimized usage.

Leading Supercomputers

As of the June 2025 TOP500 ranking (latest available as of November 2025), the world's leading supercomputers are primarily exascale systems capable of over one exaFLOPS of performance on the High-Performance LINPACK benchmark, marking a significant advancement in computational scale for scientific simulations and AI-driven research. The top-ranked system, at in the United States, delivers 1.742 exaFLOPS using an HPE EX255a architecture with 4th Gen CPUs and MI300A accelerators interconnected via Slingshot-11, enabling breakthroughs in and real-world scientific applications like climate modeling. In second place, at achieves 1.353 exaFLOPS with HPE EX235a nodes featuring 3rd Gen processors and MI250X GPUs over Slingshot-11, supporting key research in fusion energy, , biosciences, and . Aurora, third on the list at , provides 1.012 exaFLOPS through an HPE EX system with Intel Xeon CPU Max 9470 and Data Center GPU Max accelerators connected by Slingshot-11, focusing on fusion simulations, climate analysis, and AI-for-science initiatives such as and aircraft design optimization. Other prominent systems include Fugaku at in , which sustains 442 petaFLOPS using Fujitsu's A64FX ARM-based processors and Tofu-D interconnect, continuing to excel in broad scientific computations including modeling and graph analytics despite its 2020 debut. LUMI in , ranking ninth at 379.7 petaFLOPS, employs HPE EX235a with 3rd Gen CPUs and MI250X GPUs, powering European climate science, , ocean simulations, and research while running on hydroelectric power for .
RankSystemSite (Country)Rmax (PFlop/s)Architecture Key ComponentsPower Efficiency (GFlop/s/W)
1LLNL ()1,742,000AMD 4th Gen + MI300A, Slingshot-1160.3
2ORNL ()1,353,000 3rd Gen + MI250X GPUs, Slingshot-1155.0
3ANL ()1,012,000 Xeon CPU Max 9470 + Data Center GPU Max, Slingshot-1126.2
7Fugaku ()442,010 ARM CPUs, Tofu-D14.8
9 ()379,700 3rd Gen + MI250X GPUs, Slingshot-1153.4
These leading systems exemplify a shift toward heterogeneous architectures integrating CPUs with specialized accelerators like GPUs for enhanced efficiency. Power consumption for such machines typically ranges from 20 to 30 megawatts, balancing exascale with energy constraints through advanced cooling and interconnect technologies. The U.S. Exascale Computing Project, a collaborative initiative involving the Department of Energy and industry partners, targets the deployment of systems exceeding 2 exaFLOPS of performance to address complex scientific simulations, with key milestones like the Aurora supercomputer at Argonne National Laboratory, which reached 1.012 exaFLOPS upon full operation in 2025. Beyond exascale, visions for zettascale computing—representing a 1,000-fold increase in performance—emerge as the next frontier, with early roadmaps such as Intel's 2021 vision aiming for this scale by 2027 through advanced processor architectures and scalable interconnects, while international efforts like Japan's FugakuNEXT aim for similar targets by 2030. These developments emphasize hybrid architectures combining traditional CPUs with specialized accelerators to overcome current scaling challenges in power and data movement. Integration of and into high-performance computing is advancing through neuromorphic systems, which emulate neural structures for energy-efficient processing of hybrid workloads that blend traditional simulations with AI-driven . These systems promise performance and reduced power demands compared to conventional HPC resources, enabling applications like adaptive climate modeling. Complementing this, tensor processing units (TPUs) from facilitate accelerated execution of AI tasks within broader HPC environments, scaling cost-efficiently for and in cloud-based hybrid setups. Such integrations are pivotal for handling the in data-intensive computations, with frameworks like AI-coupled workflows optimizing across HPC clusters. Hybrid quantum-high-performance computing systems represent an early fusion of quantum processors with classical infrastructure, particularly for tackling optimization problems intractable on traditional hardware. IBM's quantum-centric supercomputing approach combines quantum bits for exploratory searches with HPC for refinement, demonstrating advantages in areas like supply chain logistics and molecular design. These systems leverage software like to interface quantum and classical components seamlessly, enabling scalable solutions for complex combinatorial challenges. Sustainability efforts in high-performance computing focus on carbon-neutral designs, with facilities like RWTH Aachen's CLAIX-2023 system powered exclusively by renewable energy sources to minimize emissions during operation. Companies such as HPC AG have achieved full climate neutrality by compensating for all greenhouse gases and integrating green power, setting precedents for industry-wide adoption. Photonic interconnects further enhance efficiency by replacing electrical links with light-based transmission, potentially reducing energy consumption by up to 50% in data movement-heavy workloads while maintaining high bandwidth. Edge high-performance computing extends distributed processing to ecosystems, enabling real-time analytics at the network periphery for applications in and prospective environments. This paradigm supports low-latency computations on resource-constrained devices, such as in smart cities, by offloading intensive tasks from central clouds to edge nodes. With visions emphasizing communications, edge HPC will facilitate hyper-distributed architectures for seamless data and .

References

  1. [1]
    What is High Performance Computing? | U.S. Geological Survey
    High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance.
  2. [2]
    [PDF] Introduction to High Performance Computing - Boston University
    • High Performance Computing (HPC) refers to the practice of aggregating computing power in order to solve large problems in science, engineering, or business. ...Missing: definition | Show results with:definition
  3. [3]
    [PDF] History and overview of high performance computing
    design began in 1966; goal was 1 GFLOP/s and estimated cost was. $8 million. “finished” in 1971-1972 at a cost of $31 million and a top speed well.
  4. [4]
    Why HPC? - Purdue's RCAC
    High-Performance Computing (HPC) refers to the use of powerful computers and advanced software tools to perform complex calculations and simulations that ...Missing: definition | Show results with:definition
  5. [5]
    High Performance Computing | NIST
    Oct 18, 2010 · Our High Performance Computing (HPC) program enables work on challenging problems that are beyond the capacity and capability of desktop computing resources.Missing: definition | Show results with:definition
  6. [6]
    Supercomputing - Department of Energy
    Supercomputing - also known as high-performance computing - is the use of powerful resources that consist of multiple computer systems working in parallel ...
  7. [7]
    What Is High-Performance Computing (HPC)? - IBM
    HPC is a technology that uses clusters of powerful processors that work in parallel to process massive, multidimensional data sets and solve complex problems ...Missing: authoritative | Show results with:authoritative
  8. [8]
    High-Throughput Computing (HTC) and its Requirements
    In contrast, High Performance Computing (HPC) environments deliver a tremendous amount of compute power over a short period of time. HPC environments are often ...
  9. [9]
    High‐Throughput Computing Versus High‐Performance Computing ...
    Jan 28, 2015 · HTC differs from high-performance computing (HPC), where rapid interaction of intermediate results is required to perform the computations.
  10. [10]
    The Evolution of HPC | Inside HPC & AI News
    Aug 24, 2016 · The name supercomputer fell out of favor and the term “HPC systems” was used to described high performance clusters. Individual servers were ...
  11. [11]
    Understanding Exascale
    While these petascale systems are quite powerful, the next milestone in computing achievement is the exascale—a higher level of performance in computing that ...
  12. [12]
    Exascale Computing Takes Research to the Next Level
    Jul 7, 2023 · Before exascale, the fastest supercomputers in the world could handle problems at the petascale, or one quadrillion operations each second.
  13. [13]
    9.3. Parallel Design Patterns — Computer Systems Fundamentals
    Task parallelism refers to decomposing the problem into multiple sub-tasks, all of which can be separated and run in parallel. Data parallelism, on the other ...
  14. [14]
    [PDF] A Primer on Parallel Programming - Princeton Research Computing
    Oct 18, 2021 · Differentiate parallel paradigms by how memory and communication is managed. Shared Memory. Distributed Memory. Hybrid, Distributed-Shared- ...
  15. [15]
    Introduction to Parallel Computing Tutorial - | HPC @ LLNL
    Amdahl's Law states that potential program speedup is defined by the fraction of code (P) that can be parallelized: 1 speedup = -------- 1 - P. If none of ...
  16. [16]
    [PDF] Validity of the Single Processor Approach to Achieving Large Scale ...
    Amdahl. TECHNICAL LITERATURE. This article was the first publica- tion by Gene Amdahl on what became known as Amdahl's Law. Interestingly, it has no equations.
  17. [17]
    Reevaluating Amdahl's Law and Gustafson's Law - Temple CIS
    Amdahl's Law and Gustafson's Law are used to estimate speedups in parallel processing. They are mathematically equivalent, with only two different formulations.
  18. [18]
    (PDF) Gustafson's Law - ResearchGate
    In a 1967 conference debate over the merits of parallel computing, IBM's Gene Amdahl argued · He asserted that this would sharply limit the approach of parallel ...Missing: original | Show results with:original
  19. [19]
    Memory, Cache, Interconnects - Cornell Virtual Workshop
    Hierarchical memory—from RAM on multiple nodes, through successive levels of cache—is used to feed data down to a single core's hardware registers for ...
  20. [20]
    17.1. Intro to Parallel Computing
    Shared vs. Distributed Memory#. Shared Memory: In a shared memory model, multiple processors access the same memory space. This allows for efficient ...
  21. [21]
    Glossary & Terminology - HPC@UMD
    Distributed memory parallelism is a paradigm for parallel computing in which the parallel processes do not all share a global memory address space. Because of ...<|separator|>
  22. [22]
    DOE Explains...Exascale Computing - Department of Energy
    One way scientists measure computer performance is in floating point operations per second (FLOPS). These involve simple arithmetic like addition and ...
  23. [23]
    The world's first general purpose computer turns 75 | Penn Today
    Feb 11, 2021 · Designed by John Mauchly and J. Presper Eckert, ENIAC was the fastest computational device of its time, able to do 5,000 additions per second, ...
  24. [24]
    5 Lessons From History | Funding a Revolution: Government Support for Computing Research | The National Academies Press
    ### Summary of Early Government Funding for High-Performance Computing (1940s-1970s)
  25. [25]
    [PDF] ENIAC: The “first” electronic computer
    18,000 vacuum tubes, 70, 000 resistors, etc.. It also generated lots of heat and the cooling systems weighted a couple of tons. /. The following machine ...
  26. [26]
  27. [27]
    [PDF] L.1 Introduction L-2 L.2 The Early Development of Computers ...
    In 1964, Control Data delivered the first supercomputer, the CDC 6600. As. Thornton [1964] discussed, he, Cray, and the other 6600 designers were among the ...
  28. [28]
    [PDF] COMPUTER ARCHITECTURE TECHNIQUES FOR POWER ...
    Power dissipation issues have catalyzed new topic areas in computer architecture, resulting in a substantial body of work on more power-efficient architectures.
  29. [29]
    [PDF] ILLIAC IV
    ILLIAC IV is a milestone in computer development in that it provides a level of parallel processing many times that of conventional designs. To achieve this ...
  30. [30]
    [PDF] LIBRARY80Y - NASA Technical Reports Server (NTRS)
    to Honeywell back to Burroughs and funding agencies for ILLIAC IV went from the Advanced Research Project Agency (ARPA)to ARPAplus other government agencies ...
  31. [31]
    About | TOP500
    The TOP500 project was launched in 1993 to improve and renew the Mannheim supercomputer statistics, which had been in use for seven years.Missing: history | Show results with:history
  32. [32]
    [PDF] New Day Dawns in Supercomputing
    The goal of the Accelerated Strategic Computing Initiative. (ASCI) is to provide the numerical simulation capability needed to model the safety, reliability, ...
  33. [33]
    [PDF] The Green500 List: Year Two - TOP500
    More specifically, from November 2007 to November 2009, the maxi- mum energy efficiency increased by 102% from 357. MFLOPS/W to 723 MFLOPS/W while the average ...
  34. [34]
    [PDF] The Green500 List: Encouraging Sustainable Supercomputing
    Dec 2, 2007 · Beginning with the November. 2007 Green500 List, we'll use metered measurements in rankings whenever available. As the list matures, we.
  35. [35]
    Breaking the petaflop barrier - IBM
    Roadrunner broke the petaflop barrier on May 25, 2008, but not without some last-minute drama. The day's first hurdle was simply to ensure the machine could ...Missing: 1.7 | Show results with:1.7
  36. [36]
    Roadrunner Supercomputer Breaks the Petaflop Barrier - OSTI
    Jun 9, 2008 · At 3:30 a.m. on May 26, 2008, Memorial Day, the "Roadrunner" supercomputer exceeded a sustained speed of 1 petaflop/s, or 1 million billion ...Missing: 1.7 | Show results with:1.7
  37. [37]
    Japan's Fugaku gains title as world's fastest supercomputer | RIKEN
    Jun 23, 2020 · The supercomputer Fugaku, which is being developed jointly by RIKEN and Fujitsu Limited based on Arm® technology, has taken the top spot on the Top500 list.
  38. [38]
    AI for Science Platform Division | RIKEN Center for Computational ...
    This division's responsibilities extend to the integration of the 'Fugaku' supercomputer, with a novel AI-dedicated supercomputing system. This integration ...
  39. [39]
    At the Frontier: DOE Supercomputing Launches the Exascale Era
    Jun 7, 2022 · Frontier broke the exascale limit, reaching 1.1 exaflops of performance on the High-Performance Linpack benchmark. Exascale performance is ...
  40. [40]
    Frontier supercomputer hits new highs in third year of exascale | ORNL
    Nov 18, 2024 · The Frontier team achieved a High-Performance Linpack, or HPL, score of 1.35 exaflops, or 1.35 quintillion calculations per second using double- ...
  41. [41]
    High Performance Computing (HPC) Architecture - Intel
    Oftentimes, these components include a CPU and an accelerator such as an FPGA or GPU, plus memory, storage, and networking components. Nodes or servers of ...
  42. [42]
    5th Gen AMD EPYC™ Processors Elevate HPC and AI Workloads to ...
    Oct 14, 2024 · AMD EPYC 9005 Series Processors offer from 8 to 192 cores and TDPs ranging from 155 to 500W. I've selected the 5th Gen 64-core high frequency ...
  43. [43]
    [PDF] NVIDIA A100 | Tensor Core GPU
    The A100 is a powerful GPU with 80GB memory, 19.5 TFLOPS FP32, 9.7 TFLOPS FP64, 2TB/s memory bandwidth, and 20x higher performance than Volta. It can scale up ...
  44. [44]
    High Performance Computing Using FPGAs (WP375)
    Advancements in silicon, software, and IP have proven Xilinx FPGAs to be the ideal solution for accelerating applications on high-performance embedded ...
  45. [45]
    NVIDIA ConnectX InfiniBand Adapters
    The ConnectX-7 smart host channel adapter (HCA) provides ultra-low latency, 400Gb/s throughput, and innovative NVIDIA In-Network Computing acceleration engines ...
  46. [46]
    NVIDIA Quantum-2 InfiniBand Platform
    NVIDIA Quantum-2 empowers the world's leading supercomputing data centers with software-defined networking, In-Network Computing, performance isolation.
  47. [47]
    Performance evaluation of High Bandwidth Memory for HPC ...
    In this paper, we study the performance of latency and energy of one such 3D stacked memory, namely the High Bandwidth Memory (HBM).
  48. [48]
    [PDF] What Modern NVMe Storage Can Do, And How To Exploit It
    NVMe SSDs based on flash are cheap and offer high throughput. Combining several of these devices into a single server enables. 10 million I/O operations per ...
  49. [49]
    Chapter 20: Thermal - IEEE Electronics Packaging Society
    Jun 19, 2019 · However, these conflicting trends have resulted in a substantial increase in both heat flux >350 W/cm² and power density, which reduced the ...
  50. [50]
    Thermal Performance Characteristic of Single-phase Immersion ...
    Therefore, liquid cooling has more attention to address the thermal management challenges from increasing TDP of chips and rack power density in DCs. Among ...
  51. [51]
    Computer Clusters and MPP Architectures - BrainKart
    Mar 24, 2017 · In this section, we will start by discussing basic, small-scale PC or server clusters. We will discuss how to construct large-scale clusters and MPPs in ...
  52. [52]
    [PDF] A Message-Passing Interface Standard - MPI Forum
    Nov 2, 2023 · This document describes the Message-Passing Interface (MPI) standard, version 4.1. The MPI standard includes point-to-point message-passing, ...
  53. [53]
    Specifications - OpenMP
    Sep 15, 2025 · The OpenMP API supports multi-platform shared-memory parallel programming in C/C++ and Fortran. The OpenMP API defines a portable, scalable model.
  54. [54]
    BLAS (Basic Linear Algebra Subprograms) - The Netlib
    The BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for performing basic vector and matrix operations.BLAS Technical Forum · FAQ · Blas/gemm_based · BLAS(Legacy Website)
  55. [55]
    LAPACK — Linear Algebra PACKage - The Netlib
    LAPACK is a software package providing routines for solving linear equations, least-squares, eigenvalue, and singular value problems. It is freely available.Lapack · LAPACK Users' Guide -- Third... · Lapack 3.5.0 · LAPACK95 -- Fortran95...
  56. [56]
    CUDA C++ Programming Guide
    The programming guide to the CUDA model and interface.
  57. [57]
    Overview - Slurm Workload Manager - SchedMD
    Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
  58. [58]
    OpenPBS Open Source Project
    OpenPBS software optimizes job scheduling and workload management in high-performance computing (HPC) environments - clusters, clouds, and supercomputers - ...
  59. [59]
    Rocks Cluster
    Rocks is an open-source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization ...Downloads · Docs and Support · Register Your Cluster · Archive
  60. [60]
    Valgrind Home
    Official Home Page for valgrind, a suite of tools for debugging and profiling. Automatically detect memory management and threading bugs, ...Tool Suite · Current Releases · The Valgrind Quick Start Guide · Code Repository
  61. [61]
    TAU - Tuning and Analysis Utilities - - Computer Science
    TAU Performance System® is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, UPC, Java, Python.
  62. [62]
    [PDF] The TOP500 Project
    Jan 20, 2008 · The TOP500 project was launched in 1993 to provide a reliable basis for tracking and detecting trends in high ... 1st TOP500 List, 06/1993 30th ...
  63. [63]
    TOP500 Founder Erich Strohmaier on the List's Evolution
    The TOP500 list of the world's fastest supercomputers first debuted more than two decades ago, in June 1993, the brainchild of Berkeley Lab scientist Erich ...Missing: origins | Show results with:origins
  64. [64]
    List Statistics | TOP500
    List Statistics: R max and R peak values are in GFlops. For more details about other fields, check the TOP500 description.Missing: history | Show results with:history<|separator|>
  65. [65]
    HPC Power Efficiency and the Green500 - HPCwire
    Nov 20, 2013 · The first Green500 List was launched in November 2007 ranking the energy efficiency of supercomputers. Co-founder Kirk W. Cameron discusses ...
  66. [66]
    El Capitan: NNSA's first exascale machine
    While it is one of the world's most energy-efficient supercomputers, El Capitan requires about 30 megawatts (MW) of energy to run at peak—enough power to run a ...
  67. [67]
    HPCG Benchmark
    - **HPCG Benchmark**: High Performance Conjugate Gradients, a metric to rank HPC systems, complementing High Performance LINPACK (HPL).
  68. [68]
    Green500 - TOP500
    The 25th Green500 List was published June 14, 2025 in Hamburg. November 2024. The 24th Green500 List was published Nov. 19, 2024 in Atlanta, GA. June 2024.June 2024 · June 2025 · November 2024 · November 2020
  69. [69]
    JUPITER Sets New Energy Efficiency Standards with #1 Ranking on ...
    Oct 4, 2025 · The first module of the exascale supercomputer JUPITER is ranked first place in the Green500 list of the most energy-efficient supercomputers
  70. [70]
    [PDF] High-performance conjugate-gradient benchmark: A new metric for ...
    The high- performance conjugate-gradient (HPCG) benchmark is used to test a high-performance conjugate (HPC) machine's ability to solve these important ...
  71. [71]
  72. [72]
    hpc/ior: IOR and mdtest - GitHub
    This repository contains the IOR and mdtest parallel I/O benchmarks. The official IOR/mdtest documentation can be found in the docs/ subdirectory or on Read ...Missing: throughput | Show results with:throughput
  73. [73]
    IOR - Lustre Wiki
    Oct 13, 2025 · IOR (Interleaved or Random) is a commonly used file system benchmarking application particularly well-suited for evaluating the performance of parallel file ...<|separator|>
  74. [74]
    NAS Parallel Benchmarks - NASA Advanced Supercomputing Division
    Jun 18, 2024 · The NAS Parallel Benchmarks (NPB) are programs to evaluate parallel supercomputer performance, derived from CFD applications, with five kernels ...
  75. [75]
    Graph 500 | large-scale benchmarks
    Data intensive supercomputer applications are increasingly important for HPC ... Graph 500 will establish a set of large-scale benchmarks for these applications.Green Graph500 · Benchmark Specification · Complete Results · SubmissionsMissing: big | Show results with:big
  76. [76]
    [PDF] Accelerating Time-to-Solution for Computational Science and ...
    To minimize the time-to-solution of a computational science or engineering problem, the time to write and also run the program must both be considered.
  77. [77]
    [PDF] HPC Cloud for Scientific and Business Applications - arXiv
    FLOPS per-dollar is an important metric. From ... Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on Amazon EC2.
  78. [78]
    [PDF] EXASCALE REQUIREMENTS REVIEW - Argonne Blogs
    ... scale climate and Earth system modeling; the interdependence of climate effects and ecosystems; and integrated analysis of climate impacts on energy and related.
  79. [79]
  80. [80]
    Millennium Simulation
    ### Millennium Simulation Summary
  81. [81]
    AlphaFold - Google DeepMind
    In 2020, AlphaFold solved this problem, with the ability to predict protein structures in minutes, to a remarkable degree of accuracy. That's helping ...Missing: HPC clusters
  82. [82]
    LatticeQCD - Exascale Computing Project
    The LatticeQCD project is implementing scalable QCD algorithms to realistically simulate the atomic nucleus to reveal a deeper understanding of the fundamental ...
  83. [83]
    HPC, Simulation and Data Science | Lawrence Livermore National ...
    ... (HPC), simulation and data science, impact an array of mission challenges ... modeling, statistical inference, uncertainty quantification and more. Learn ...
  84. [84]
    How the Cloud Has Evolved Over the Past 10 Years - Dataversity
    Apr 6, 2021 · By 2010, Amazon, Google, Microsoft, and OpenStack had all launched cloud divisions. This helped to make cloud services available to the masses.
  85. [85]
    High Performance Computing (HPC) - Amazon AWS
    Using AWS, expedite your high performance computing (HPC) workloads & save money by choosing from low-cost pricing models that match utilization needs.Getting Started · Resources · AWS Batch · AWS HPC Customer Success...Missing: Azure Google history
  86. [86]
    High-Performance Computing (HPC) on Azure - Microsoft Learn
    Dec 12, 2024 · Azure Batch is a platform service for running large-scale parallel and HPC applications efficiently in the cloud. Azure Batch schedules compute ...Missing: AWS Google history
  87. [87]
    Google Cloud Introduces the Cloud HPC Toolkit
    May 27, 2022 · An open source tool that enables users to easily create repeatable, turnkey HPC clusters based on proven best practices.Missing: AWS Azure Batch history
  88. [88]
    The Cloud Wars: AWS vs. Microsoft Azure vs. Google Cloud - History ...
    Oct 2, 2024 · Explore the ongoing Cloud Wars between industry giants—AWS, Microsoft Azure, and Google Cloud. Learn about the history, key players, ...
  89. [89]
    What is HPC or High-Performance Computing in the Cloud
    Dec 19, 2023 · With cloud HPC, you can spin up and down computing resources in minutes, letting you test different algorithms and hyperparameters quickly. A ...
  90. [90]
    Dispelling Cloud Cost Myths: Why Cloud Wins for Research and AI
    Sep 16, 2025 · Cloud cost myths debunked: Research teams save up to 90% on HPC and GPU computing with AWS and Azure spot instances.
  91. [91]
    High-Performance Computing in the Cloud: Opportunities and ...
    Oct 3, 2025 · Many cloud services, such as AWS Batch or Azure Batch, natively support containerized workloads, streamlining the execution of complex HPC ...Missing: Toolkit | Show results with:Toolkit
  92. [92]
    What are the latency differences between InfiniBand and 100GbE ...
    Sub-microsecond latency: InfiniBand typically achieves end-to-end latencies in the range of 0.7 to 1.5 microseconds, depending on configuration and hardware.
  93. [93]
    [PDF] High-Performance Computing (HPC) Security: Architecture, Threat ...
    Feb 2, 2024 · This document covers High-Performance Computing Security, including architecture, threat analysis, and security posture.
  94. [94]
    Cloud bursting for research computing - AWS Prescriptive Guidance
    The cloud addresses these challenges with hybrid compute and storage solutions that let you burst research computing into the cloud when on-premises capacity ...
  95. [95]
    Cloud Computing | NASA Center for Climate Simulation
    The Science Managed Cloud Environment (SMCE) is a managed Amazon Web Service (AWS) based infrastructure for NASA funded projects that can leverage cloud ...Missing: 2023 | Show results with:2023
  96. [96]
    TOP500 List - June 2025
    The top 3 systems are El Capitan (1,742.00 PFlop/s), Frontier (1,353.00 PFlop/s), and Aurora (1,012.00 PFlop/s).
  97. [97]
    Frontier - Oak Ridge Leadership Computing Facility
    Frontier is an exascale supercomputer, the first of its kind, achieving a quintillion calculations per second, and is the world's fastest on the 59th TOP500 ...
  98. [98]
    Aurora Exascale Supercomputer - Argonne National Laboratory
    U.S. Department of Energy's INCITE program seeks proposals for 2025 to advance science and engineering at U.S. leadership computing facilities. The INCITE ...Aurora · Aurora by the Numbers · Argonne’s Aurora... · Aurora Early Science
  99. [99]
    Supercomputer Fugaku : Fujitsu Global
    The supercomputer Fugaku, jointly developed by RIKEN and Fujitsu, has successfully retained the top spot for 11 consecutive terms in the Graph500 BFS.Missing: 2020 | Show results with:2020
  100. [100]
    Highlights - June 2025 - TOP500
    The systems of the TOP500 are ranked by how much computational performance they deliver on the HPL benchmark per Watt of electrical power consumed.
  101. [101]
    Figure 9 Annual energy usage by the top 10 supercomputers ... - WIPO
    Figure 9 Annual energy usage by the top 10 supercomputers ... Aurora 254.25 GWh/y Supercomputer Fugaku 196.44 GWh/y El Capitan 194.35 GWh/y Frontier 161.67 GWh/y ...
  102. [102]
    Intel Aims For Zettaflops By 2027, Pushes Aurora Above 2 Exaflops
    Oct 27, 2021 · This is slated to take more than 9,000 nodes, which means more than 18,000 Sapphire Rapids CPUs and more than 54,000 Ponte Vecchio GPU ...
  103. [103]
    Exascale Computing | PNNL
    The computer is being developed at Argonne National Laboratory and at its peak will have a performance of more than 2 exaFLOPS. It will be used for a range of ...<|separator|>
  104. [104]
    Forget Zettascale, Trouble is Brewing in Scaling Exascale ... - HPCwire
    Nov 14, 2023 · In 2021, Intel famously declared its goal to get to zettascale supercomputing by 2027, or scaling today's Exascale computers by 1,000 times.
  105. [105]
    Japan Unveils Plans for Zettascale Supercomputer: 100 PFLOPs of ...
    Aug 28, 2024 · With a target completion date of 2030, the new supercomputer aims to surpass current technological boundaries, potentially becoming the world's ...<|separator|>
  106. [106]
    Are neuromorphic systems the future of high-performance computing?
    Mar 8, 2022 · Scientists and engineers are developing computing technologies that mimic how neurons operate in the brain. This is not just about building faster computers.
  107. [107]
    The road to commercial success for neuromorphic technologies
    Apr 15, 2025 · DC placement implies interfacing of NM compute with standard high-performance computing (HPC) busses, and with HPC scheduling systems.
  108. [108]
    Neuromorphic Computing - Human Brain Project
    Compared to traditional HPC resources, the Neuromorphic systems potentially offer higher speed (real-time or accelerated) and lower energy consumption. The ...
  109. [109]
    Tensor Processing Units (TPUs) - Google Cloud
    Cloud TPUs are designed to scale cost-efficiently for a wide range of AI workloads, spanning training, fine-tuning, and inference. Cloud TPUs provide the ...Missing: HPC- | Show results with:HPC-
  110. [110]
    AI-coupled HPC Workflow Applications, Middleware and Performance
    Jun 20, 2024 · This paper surveys the diverse and rapidly evolving field of AI-driven HPC and provides a common conceptual basis for understanding AI-driven HPC workflows.
  111. [111]
    What is Quantum-Centric Supercomputing? - IBM
    Quantum-centric supercomputing is a revolutionary approach to computer science that combines quantum computing with traditional high-performance computing (HPC)
  112. [112]
    Building software for quantum-centric supercomputing - IBM
    Sep 15, 2025 · IBM has spent years laying the groundwork for a future where quantum and classical high-performance computing (HPC) systems work together to ...
  113. [113]
    Interfacing Quantum Computing Systems with High-Performance ...
    Sep 7, 2025 · Section 5 illustrates practical application domains where hybrid HPC-QC systems have shown significant promise, specifically optimization ...
  114. [114]
    A Hybrid Approach for Solving Optimization Problems on Small ...
    Jun 18, 2019 · A hybrid quantum and classical approach may be the answer to tackling this problem with existing quantum hardware. Approach. A team of ...
  115. [115]
    Green HPC: Sustainability in the Data Centers of Tomorrow
    Apr 16, 2025 · This not only saves electricity, but also significantly reduces CO2 emissions. In addition, CLAIX-2023 is powered exclusively by renewable ...
  116. [116]
    This is why we are a climate-neutral company - HPC AG
    The greenhouse gases caused by HPC AG are recorded and compensated. This makes us one of the first companies in our industry to voluntarily compensate for ...
  117. [117]
    Photonic Interconnects - an overview | ScienceDirect Topics
    Interconnect will be the bottleneck as it will consume 80% of the power, 40% of the system cost, and will be responsible for 50–90% of the performance: ...
  118. [118]
    Photonics for sustainable AI | Communications Physics - Nature
    Oct 14, 2025 · This high energy efficiency of photonic computing helps to achieve a low OCF as well as high throughput in photonic systems. Photonic chips ...
  119. [119]
    [PDF] Edge Computing in IoT: A 6G Perspective - arXiv
    May 15, 2022 · Abstract—Edge computing is one of the key driving forces to enable Beyond 5G (B5G) and 6G networks. Due to the unprecedented increase in ...
  120. [120]
    Edge Computing in the Internet of Things: A 6G Perspective
    This article presents key considerations for edge deployments in B5G/6G networks, including edge architecture, server location, capacity, user density, ...
  121. [121]
    6G—Enabling the New Smart City: A Survey - PMC - PubMed Central
    For example, real-time video analytics can be performed at the edge of the network using edge devices, with the processed data transmitted over 5G networks.4. The Role Of 5g In Smart... · 5. 6g--The Emerging New... · 7.1. Terahertz...