Fact-checked by Grok 2 weeks ago

High-performance computing

High-performance computing (HPC), also known as supercomputing, refers to the aggregation of computing power from multiple processors or interconnected computers to perform complex calculations and simulations at speeds far exceeding those of standard desktop or workstation systems.^[1] This approach leverages parallel processing techniques, where tasks are divided across numerous processing units to solve large-scale problems in fields such as science, engineering, and business that would otherwise be computationally infeasible or excessively time-consuming.^[2] The history of HPC traces back to the mid-20th century, driven by the need for advanced computational capabilities in military and scientific applications. Early milestones include the ENIAC (1945), the first general-purpose electronic computer, which marked the beginning of programmable computing for complex calculations.^[3] The 1960s saw the emergence of dedicated supercomputers like the CDC 6600 (1964), which achieved speeds of up to 3 MFLOPS and is often credited as the first true supercomputer.^[4] The 1970s and 1980s were dominated by vector-processing machines from Cray Research, such as the Cray-1 (1976) at 160 MFLOPS, which introduced innovative cooling and architecture to push performance boundaries. By the 1990s, the shift to massively parallel processing and affordable clusters, exemplified by the Beowulf project (1994) using off-the-shelf PCs to achieve 1 GFLOP/s for under $50,000, democratized access to high performance.^[3] Moore's Law, positing that transistor density doubles approximately every two years, fueled this evolution until physical limits on clock speeds led to multi-core processors and hybrid systems integrating GPUs in the 2000s.^[2] Today, systems like El Capitan (2024) at Lawrence Livermore National Laboratory exceed 1.7 exaFLOPS as of June 2025, ranking atop the TOP500 list of the world's fastest supercomputers, marking the exascale era with multiple systems surpassing 1 exaFLOPS.^[5] Key technologies in HPC include clusters of interconnected nodes, each comprising multi-core CPUs, high-bandwidth memory, and accelerators like GPUs (e.g., NVIDIA Tesla series with thousands of CUDA cores) or Intel Xeon Phi processors, enabling massive parallelism.^[2] High-speed interconnects such as Infiniband facilitate low-latency communication between nodes, while scalable storage systems handle petabytes of data.^[2] Software ecosystems, including parallel programming models like MPI (Message Passing Interface) and OpenMP, optimize workload distribution, and benchmarks like HPL (High-Performance Linpack) measure system performance in floating-point operations per second (FLOPS).^[3] Recent advancements incorporate machine learning accelerators and energy-efficient architectures to address power consumption challenges, with modern systems drawing tens of megawatts.^[3] HPC plays a pivotal role in addressing grand challenges across disciplines, enabling breakthroughs that would be impossible with conventional computing. In scientific research, it powers climate modeling, astrophysics simulations, and seismic analysis to predict environmental changes and natural disasters.^[1] Engineering applications include aerodynamic design, materials science, and automotive crash simulations, accelerating innovation in industries like aerospace and manufacturing.^[6] In healthcare and bioinformatics, HPC facilitates drug discovery, genomic sequencing, and personalized medicine by processing vast datasets rapidly.^[6] Emerging uses in artificial intelligence and big data analytics further amplify its impact, supporting machine learning training on exabyte-scale information for fields like finance and social sciences.^[6] Overall, HPC's scalability and speed reduce computation times from years to days, fostering informed decision-making and economic growth through enhanced research productivity.^[6]

Fundamentals

Definition and Scope

High-performance computing (HPC) refers to the aggregation of computational resources to perform advanced calculations and simulations that exceed the capabilities of standard desktop or server systems, enabling the solution of complex scientific and engineering problems.^[7] This involves leveraging multiple processors or nodes working together to achieve significantly higher performance levels, often measured in floating-point operations per second (FLOPS).^[1] The scope of HPC encompasses supercomputing systems, distributed clusters, and parallel processing frameworks designed for large-scale data processing and modeling, such as those used in climate simulations or molecular dynamics.^[8] These systems prioritize sustained high-speed computation for resource-intensive tasks that require massive parallelism, distinguishing HPC from conventional computing by its focus on scalability and efficiency in handling multidimensional datasets.^[9] HPC differs from high-throughput computing (HTC), which emphasizes queuing and executing numerous independent tasks over extended periods to maximize overall resource utilization, whereas HPC targets peak performance for tightly coupled, interactive workloads that demand rapid data exchange between components.^[10] In HTC, the goal is long-term productivity through batch processing, while HPC seeks immediate, high-intensity bursts of computation.^[11] The term "high-performance computing" emerged in the 1990s as a broader descriptor replacing "supercomputing," reflecting the shift toward accessible cluster-based systems rather than specialized monolithic machines, driven by advances in commodity hardware and networking.^[12] Examples of HPC problem scales include petascale computing, which operates at approximately 10^15 FLOPS to model phenomena like weather patterns or protein folding, and exascale computing, operating at 10^18 FLOPS to enable finer-grained simulations such as full-system climate modeling or drug discovery at atomic levels.^[13] Petascale systems represent a foundational milestone achieved in the mid-2000s, while exascale provides a thousandfold increase in capability for unprecedented accuracy in predictive modeling as demonstrated since 2022.^[14]^[15]

Key Concepts

High-performance computing (HPC) relies on parallel computing paradigms to distribute workloads across multiple processors, enabling efficient execution of complex simulations and data analyses. Data parallelism involves dividing a large dataset into smaller subsets, with each processor performing the same operation on its assigned portion simultaneously, which is particularly effective for embarrassingly parallel tasks like matrix multiplications or image processing.^[16] In contrast, task parallelism decomposes a program into independent subtasks that can execute concurrently on different processors, allowing diverse operations such as sorting one dataset while analyzing another.^[16] Combinations of these, known as hybrid parallelism, integrate data and task approaches, often using distributed memory for inter-node coordination and shared memory within nodes to optimize resource utilization in cluster environments.^[17] Scalability in HPC refers to a system's ability to maintain or improve performance as computational resources, such as the number of processors, increase. Amdahl's Law provides a theoretical limit on speedup for fixed-size problems, stating that the maximum speedup S achievable with s processors is given by the equation:

S = \frac{1}{(1 - p) + \frac{p}{s}}

where p is the fraction of the program that can be parallelized, and $1 - p is the inherently serial portion that remains a bottleneck regardless of added processors.^[18]^[19] This law highlights that even small serial fractions can severely cap overall gains, emphasizing the need to minimize sequential code in HPC applications. As a counterpoint, Gustafson's Law addresses scaled problems where workload expands with available resources, proposing that speedup S with s processors is:

S = s p + (1 - p)

where p is now the parallelizable fraction adjusted for the scaled problem size, allowing near-linear speedups since serial components grow proportionally slower than parallel ones.^[20]^[21] This perspective is more applicable to many HPC scenarios, such as climate modeling, where larger datasets leverage additional processors effectively. Memory hierarchies in HPC optimize data access by organizing storage across levels with varying speed, capacity, and cost, from fast but small caches to slower, larger main memory. Cache memory forms the lowest levels, typically including L1 (on-core, kilobytes, sub-nanosecond access), L2 (per-core or shared, megabytes, few nanoseconds), and L3 (shared across cores, tens of megabytes, tens of nanoseconds), which temporarily hold frequently used data to reduce latency from main memory fetches.^[22] In shared memory models, all processors access a unified address space, facilitating easy data sharing but limited to single nodes due to hardware constraints like bus contention.^[23] Conversely, distributed memory models assign private memory to each processor or node, requiring explicit communication via message passing for data exchange, which scales better for large clusters but introduces overhead from network latency.^[24] A fundamental metric for evaluating HPC performance is FLOPS (floating-point operations per second), quantifying the rate of arithmetic computations essential for scientific simulations. Tiers include teraFLOPS (10^{12} operations per second) for mid-range systems, petaFLOPS (10^{15}) for leading supercomputers until the 2010s, and exaFLOPS (10^{18}) for exascale capabilities as achieved since 2022, providing a standardized measure of sustained computational power.^[25]^[15]

History

Early Developments

The origins of high-performance computing (HPC) trace back to the mid-20th century, when the demand for rapid numerical calculations during World War II spurred the development of the first programmable electronic computers. The ENIAC (Electronic Numerical Integrator and Computer), completed in 1945 at the University of Pennsylvania, represented a foundational milestone as the first general-purpose electronic digital computer designed primarily for ballistics trajectory calculations to support U.S. Army artillery efforts.^[26] Funded by the U.S. Army Ordnance Department amid the escalating needs of the war, ENIAC utilized approximately 18,000 vacuum tubes to achieve speeds of up to 5,000 additions per second, dramatically reducing computation times from hours to seconds for complex problems.^[27] However, its reliance on vacuum tube technology introduced significant challenges, including high power consumption of 150 kW, which generated substantial heat requiring massive cooling systems weighing several tons, and frequent tube failures that compromised reliability, with mean time between failures often measured in hours.^[28]^[29] The transition to the 1950s and 1960s marked the shift toward more specialized architectures optimized for scientific computing, driven by Cold War imperatives in nuclear research and space exploration. Vector processors emerged as a key innovation, enabling efficient handling of arrays of data for high-speed numerical operations. Seymour Cray's design of the CDC 6600, released by Control Data Corporation in 1964, is widely regarded as the first supercomputer, achieving a peak performance of 3 MFLOPS through its innovative use of multiple functional units and transistor-based logic that superseded vacuum tubes.^[30] This machine addressed some reliability issues of earlier systems by leveraging transistors, which were more durable and power-efficient, though cooling remained a persistent challenge due to the heat from densely packed components operating at high speeds. U.S. government agencies played a pivotal role in these advancements; the Atomic Energy Commission (AEC, predecessor to the Department of Energy or DOE) funded supercomputing at national laboratories like Los Alamos for nuclear simulations, while NASA supported vector processor developments for aerospace modeling during the space race.^[27] By the 1970s, the focus evolved toward parallel processing to overcome the limitations of single-processor designs, laying groundwork for scalable HPC systems. The ILLIAC IV, operational in 1972 at the University of Illinois, pioneered this approach with its array of 64 processors arranged in a single-instruction, multiple-data (SIMD) configuration, enabling simultaneous operations on large datasets for applications like fluid dynamics simulations.^[31] Funded primarily by the Defense Advanced Research Projects Agency (DARPA) as part of broader Cold War efforts to enhance computational capabilities for defense and scientific research, the ILLIAC IV demonstrated the potential of massively parallel architectures but highlighted ongoing challenges in synchronization and interconnect reliability, compounded by the thermal management demands of its transistor arrays.^[32] These early systems, supported by federal initiatives from the DOE, NASA, and military agencies, established the conceptual and technological foundations for HPC, emphasizing the interplay between hardware innovation, government investment, and the imperative to solve computationally intensive problems in a era of geopolitical tension.^[27]

Major Milestones

The TOP500 project, launched in 1993, established a biannual ranking of the world's 500 most powerful supercomputers based on their performance in the High-Performance Linpack benchmark, providing a standardized metric to track advancements in high-performance computing (HPC) capabilities.^[33] This initiative, initiated by researchers at the University of Mannheim and the University of Tennessee, marked a pivotal milestone by fostering global competition and enabling systematic analysis of HPC trends, with the first list published in June 1993 featuring the Thinking Machines CM-5 as the top system.^[33] In 1996, the U.S. Department of Energy (DOE) initiated the Accelerated Strategic Computing Initiative (ASCI), a program aimed at developing advanced simulation capabilities to support stockpile stewardship in the absence of nuclear testing, which accelerated the push toward terascale computing.^[25] The ASCI effort involved collaborations among national laboratories and industry partners, culminating in systems like ASCI White, which by 2000 achieved terascale performance exceeding 7 teraFLOPS, enabling complex multidimensional simulations previously infeasible.^[34] The introduction of the Green500 list in November 2007 represented a significant shift toward energy efficiency in HPC, ranking supercomputers by performance per watt to complement raw computational power metrics and encourage sustainable designs.^[35] Developed by researchers at Virginia Tech, the list highlighted the growing power consumption challenges in scaling HPC systems, with the inaugural edition showing efficiencies up to 357 megaFLOPS per watt, prompting innovations in hardware and cooling technologies.^[36] The transition to petascale computing was epitomized by IBM's Roadrunner supercomputer, deployed at Los Alamos National Laboratory in 2008, which became the first to sustain over 1 petaFLOP on the Linpack benchmark at 1.026 petaFLOPS, with a peak of 1.7 petaFLOPS.^[37] This hybrid architecture, combining AMD Opteron processors and IBM Cell Broadband Engine chips across 12,640 nodes, demonstrated scalable heterogeneous computing for scientific applications like climate modeling and nuclear simulations.^[38] Internationally, Japan's Fugaku supercomputer, developed by RIKEN and Fujitsu and operational from 2020, achieved 442 petaFLOPS on the TOP500 list while pioneering integration of HPC with artificial intelligence workloads through dedicated AI accelerators and software frameworks.^[39] Fugaku's Arm-based A64FX processors enabled hybrid simulations, such as drug discovery and earthquake modeling combined with machine learning, marking a milestone in versatile, high-impact computing for multidisciplinary research.^[40] Exascale computing efforts reached an initial milestone with the Frontier supercomputer at Oak Ridge National Laboratory in 2022, the first to exceed 1 exaFLOP on the Linpack benchmark at 1.102 exaFLOPS, powered by AMD EPYC CPUs and Instinct MI250X GPUs in an HPE Cray EX system.^[41] This DOE-funded machine, comprising over 9,400 compute nodes, advanced capabilities in areas like fusion energy research and materials science, while emphasizing power efficiency with 52.7 gigaFLOPS per watt on the Green500 list as of June 2022.^[42] Subsequent systems continued the exascale era, including Aurora at Argonne National Laboratory (2023, 1.012 exaFLOPS) and El Capitan at Lawrence Livermore National Laboratory (2024, 1.742 exaFLOPS), the latter ranking as the world's fastest as of June 2025.^[5]

Architectures and Technologies

Hardware Components

High-performance computing (HPC) systems are built from specialized hardware components optimized for parallel processing, massive data throughput, and energy efficiency under extreme workloads. These include advanced processors for computation, high-speed interconnects for node communication, high-bandwidth memory and storage for data access, cooling solutions to manage thermal loads, and scalable node architectures that form the system's backbone.^[43] Processors are the primary computational engines in HPC nodes, with multi-core central processing units (CPUs) providing versatile general-purpose performance. Intel Xeon and AMD EPYC processors dominate this space, featuring dozens to hundreds of cores per chip to handle parallel tasks in simulations and data analysis; for example, the AMD EPYC 9005 series offers up to 192 cores with thermal design powers ranging from 155 to 500 W, enabling efficient scaling in large clusters.^[44] Graphics processing units (GPUs) excel in highly parallel operations like tensor computations, with the NVIDIA H100 Tensor Core GPU delivering up to 67 TFLOPS in single-precision floating-point performance and 3.35 TB/s memory bandwidth through its HBM3, achieving up to 3 times the throughput of the A100 for HPC workloads.^[45] Accelerators such as field-programmable gate arrays (FPGAs) provide reconfigurable hardware for custom algorithms, with AMD Versal Premium VP1902 FPGAs demonstrating up to 2.5 TFLOPS peak floating-point performance in scientific applications due to their adaptable logic blocks.^[46] Interconnects enable low-latency data exchange across thousands of nodes, critical for distributed computing. High-speed networks like NVIDIA's InfiniBand support bandwidths up to 800 Gbps per port with the XDR generation as of 2025, incorporating in-network computing engines to offload tasks like collective operations and reduce CPU overhead in supercomputers.^[47] Ethernet variants, such as 400 GbE, serve as alternatives for cost-sensitive deployments, though InfiniBand's lower latency makes it preferable for tightly synchronized HPC tasks.^[48] Memory systems in HPC prioritize bandwidth to feed processors without bottlenecks. High-bandwidth memory (HBM) uses 3D-stacked DRAM to deliver over 3 TB/s throughput with HBM3E, as seen in GPU integrations where it supports simultaneous access from multiple cores in memory-intensive simulations.^[49] For storage, non-volatile memory express (NVMe) solid-state drives leverage PCIe interfaces for high input/output operations per second (IOPS), providing up to 10 million IOPS per server in parallel setups to accelerate data loading in large-scale computations.^[50] Cooling technologies are essential to sustain performance amid rising power demands. Modern HPC racks often exceed 20 kW, with AI workloads reaching up to 120 kW per rack as of 2025, necessitating advanced methods like liquid immersion cooling, which submerges components in non-conductive fluids for efficient heat dissipation, and direct-to-chip liquid cooling, which targets processor hotspots to handle densities over 350 W/cm².^[51] These solutions maintain operational temperatures below critical thresholds, preventing thermal throttling in dense deployments.^[52] Node designs determine system scalability and integration. HPC clusters comprise loosely coupled commodity servers interconnected via networks, allowing modular expansion and cost-effective scaling to exascale levels through off-the-shelf components.^[43] In contrast, massively parallel processing (MPP) systems use tightly coupled nodes with shared memory hierarchies and custom interconnects, optimizing for high-throughput workloads like weather modeling where low inter-node latency is paramount.^[53]

Software and Programming Models

High-performance computing (HPC) relies on specialized software ecosystems to exploit the capabilities of parallel hardware architectures, enabling efficient execution of computationally intensive tasks across distributed and shared memory systems. These software components include programming models, libraries, resource management systems, operating systems, and development tools, all designed to optimize performance, portability, and scalability in cluster environments.^[54] Parallel programming models form the foundation for developing HPC applications, with the Message Passing Interface (MPI) serving as the de facto standard for distributed-memory systems since its initial specification in 1994. MPI provides a portable interface for point-to-point communication, collective operations, and process management, allowing programs to coordinate across multiple nodes in a cluster.^[54] The current version, MPI-5.0 released in 2025, extends support for advanced features like persistent communication and one-sided operations to enhance efficiency on modern interconnects.^[54] For shared-memory parallelism within a single node, OpenMP offers a directive-based approach that simplifies multi-threading in C, C++, and Fortran codes without explicit thread management. OpenMP, standardized by the OpenMP Architecture Review Board, enables scalable loop-level and task-based parallelism, with version 6.0 (2024) introducing support for accelerators like GPUs.^[55] Key libraries abstract low-level hardware operations, facilitating high-performance numerical computations. The Basic Linear Algebra Subprograms (BLAS) provide optimized routines for vector and matrix operations, forming a foundational layer for many scientific algorithms.^[56] Built upon BLAS, the Linear Algebra Package (LAPACK) delivers high-level solvers for systems of linear equations, eigenvalue problems, and singular value decompositions, ensuring numerical stability and efficiency in dense linear algebra tasks.^[57] For GPU-accelerated computing, NVIDIA's Compute Unified Device Architecture (CUDA) enables direct programming of graphics processing units through a C/C++-like extension, supporting kernel launches, memory management, and thread hierarchies to achieve massive parallelism.^[58] Job scheduling systems manage resource allocation and workload distribution in HPC clusters to maximize utilization and minimize wait times. Slurm Workload Manager, an open-source solution widely adopted since 2003, handles job queuing, dependency resolution, and fault-tolerant scheduling for Linux-based clusters of varying scales.^[59] Similarly, the Portable Batch System (PBS), originating from NASA's 1990s implementation and now available as open-source OpenPBS, supports batch job submission, priority-based queuing, and integration with diverse hardware configurations.^[60] Optimized operating systems underpin HPC deployments, with Linux distributions tailored for cluster environments. Rocks Cluster Distribution, based on CentOS, automates the installation and configuration of compute nodes, frontend servers, and networking for rapid cluster provisioning.^[61] Debugging and optimization tools are essential for identifying bottlenecks and ensuring reliability in parallel codes. Valgrind, a dynamic instrumentation framework, detects memory leaks, invalid accesses, and threading errors at runtime, supporting multiple architectures including x86 and ARM.^[62] The Tuning and Analysis Utilities (TAU) toolkit provides portable profiling and tracing for parallel programs in languages like Fortran, C, and Python, capturing metrics such as execution time, hardware counters, and I/O events to guide performance tuning.^[63]

Performance Evaluation

TOP500 Project

The TOP500 project was launched in 1993 by Hans Meuer of the University of Mannheim, Erich Strohmaier, and Jack Dongarra of the University of Tennessee, Knoxville, as an initiative to track and analyze trends in high-performance computing systems. The first list was presented at the International Supercomputing Conference (ISC) in Germany, compiling performance data from 116 supercomputers based on voluntary submissions from owners and vendors. This effort aimed to provide a standardized, reliable snapshot of global HPC capabilities, evolving into a biannual ranking of the 500 most powerful non-distributed computer systems worldwide.^[64]^[65] The project's methodology centers on the High-Performance Linpack (HPL) benchmark, a standardized test that ranks systems by their sustained floating-point operations per second (FLOPS) in solving a dense system of linear equations using Gaussian elimination. HPL measures R_max, the achieved performance on randomly generated matrices, expressed in teraflops (TFLOPS), petaflops (PFLOPS), or exaflops (EFLOPS), rather than theoretical peak performance (R_peak). Submissions must include detailed hardware configurations, installation dates, and results validated against official HPL rules to ensure comparability, with the full list published alongside statistical analyses of architectural trends, processor types, and interconnects.^[64] Updates occur twice annually—in June at the ISC High Performance conference in Hamburg, Germany, and in November at the SC (Supercomputing) conference in the United States—allowing the list to reflect rapid advancements in HPC technology. Each edition includes not only the ranked systems but also aggregated data on performance growth, market shares by vendor and country, and emerging metrics like energy efficiency, which have become increasingly relevant as systems scale. As of the most recent list (June 2025), El Capitan retains the No. 1 position; the next list is scheduled for November 2025 at SC25.^[64]^[66]^[5] The TOP500 has profoundly influenced the HPC ecosystem by spurring international competition among governments, research institutions, and manufacturers, leading to accelerated innovation in processor architectures and system designs. It has illuminated long-term trends, such as the exponential growth in computational power alongside rising power demands, with top systems evolving from approximately 850 kW consumption in 1993 (e.g., the inaugural No. 1 CM-5/1024) to multi-megawatt scales today, as exemplified by El Capitan's 30+ MW draw. However, critics argue that its focus on HPL performance overemphasizes raw speed at the expense of real-world efficiency, since Linpack results often represent an optimistic upper bound compared to diverse application workloads.^[64]^[67]^[68]^[69]

Other Benchmarks and Metrics

Beyond the TOP500's focus on peak floating-point performance via the LINPACK benchmark, several other metrics evaluate high-performance computing (HPC) systems in terms of energy efficiency, memory-bound operations, input/output (I/O) capabilities, and application-specific workloads. These benchmarks address limitations in dense linear algebra tests by emphasizing real-world patterns such as sparse computations, data movement, and scalability across diverse scientific domains.^[70] The Green500 list ranks supercomputers by energy efficiency, measuring performance in gigaflops per watt (GFlops/W) to highlight power consumption alongside computational capability. Launched in 2007 as a complement to the TOP500, it promotes sustainable HPC designs by incentivizing low-power architectures, such as those using ARM processors or advanced cooling. As of the November 2024 list, the top system achieved approximately 65 GFlops/W; as of the June 2025 list, the top system (JEDI, the first module of JUPITER) achieved 72.733 GFlops/W and remains No. 1. Trends show continued progress in energy efficiency for exascale systems, with JUPITER averaging ~11 MW power consumption (up to 17 MW maximum).^[71]^[72]^[73] The High-Performance Conjugate Gradient (HPCG) benchmark assesses systems on sparse matrix operations, which are prevalent in real-world simulations like fluid dynamics, climate modeling, and electromagnetics. Developed by Michael Heroux at Sandia National Laboratories and released in 2013, HPCG uses a multigrid-preconditioned conjugate gradient solver involving sparse matrix-vector multiplications, vector updates, and global reductions, typically yielding 5-10% of LINPACK performance due to its memory-intensive nature. It ranks systems biannually alongside TOP500 results, with top scores as of June 2025 reaching 17.4 petaflops on El Capitan (Frontier at 14.05 petaflops), underscoring the gap between theoretical peak and practical efficiency.^[74]^[70]^[75] I/O benchmarks are crucial for petabyte-scale HPC environments, where data throughput can bottleneck computations. The IOR (Interleaved or Random) tool evaluates parallel file system bandwidth and latency using MPI-IO or POSIX interfaces, simulating workloads like large-block writes or shared-file access to measure sustained throughput in gigabytes per second. Complementing IOR, mdtest focuses on metadata operations such as file creation, deletion, and stat calls, testing scalability under high-concurrency scenarios on parallel storage systems like Lustre or GPFS. In large-scale tests, top systems achieve over 100 GB/s aggregate bandwidth, but variability arises from file striping and network contention.^[76]^[77]^[78] Application-specific benchmarks provide targeted insights into domain performance. The NAS Parallel Benchmarks (NPB), developed by NASA in the early 1990s, derive from computational fluid dynamics (CFD) kernels and pseudo-applications, evaluating parallel efficiency through metrics like iterations per second on problems involving integer sort (IS), conjugate gradient (CG), and multigrid (MG) solvers. NPB suites scale from small (Class S) to massive (Class F) problems, revealing communication overheads in MPI or OpenMP implementations. Similarly, the Graph500 benchmark targets big data graph analytics, measuring traversed edges per second in breadth-first search (BFS) on synthetic scale-free graphs up to billions of vertices, as seen in Fugaku's top ranking as of November 2024 of 204.068 tera-traversed edges per second (TEPS) for applications in social networks and bioinformatics.^[79]^[80]^[81] Additional metrics emphasize holistic system value. Time-to-solution (TTS) quantifies the total duration to obtain a scientifically valid result, encompassing algorithm development, execution, and post-processing, rather than isolated runtime; for instance, optimizing TTS can reduce overall project timelines by integrating hardware-software co-design. Cost-effectiveness metrics, such as dollars per teraflop-hour, assess economic viability by dividing acquisition and operational costs (including energy) by delivered performance, with cloud-based HPC often achieving under $0.01 per teraflop in 2025 deployments compared to on-premises systems. These approaches prioritize practical impact over raw speed.^[82]^[83]

Applications

Scientific and Research Domains

High-performance computing (HPC) plays a pivotal role in advancing fundamental scientific research across diverse domains, enabling simulations that capture complex phenomena at scales unattainable by traditional methods. In climate modeling, general circulation models (GCMs) and Earth system models (ESMs) such as CESM2 and ACME integrate atmospheric, oceanic, and biogeochemical processes to generate projections for Intergovernmental Panel on Climate Change (IPCC) assessments. These models demand exascale computing to achieve high resolutions—targeting 1 km globally or finer regionally—while running large ensembles of 100 or more members to quantify uncertainties in phenomena like extreme weather, sea-level rise, and climate sensitivity. For instance, CMIP6 simulations under CESM2 require approximately 260 million core-hours to produce over 25,000 simulated years, handling data volumes that escalate to exabytes due to high-resolution outputs and observational integrations from facilities like ARM. Recent exascale systems like Frontier have further accelerated these simulations, enabling higher-fidelity Earth system modeling as of 2025.^[84]^[85] In astrophysics, HPC facilitates N-body simulations that model the gravitational dynamics of dark matter and baryonic matter, elucidating galaxy formation and the large-scale structure of the universe. A seminal example is the Millennium Simulation of 2005, which tracked the evolution of 10 billion particles in a cubic volume spanning 2.23 billion light-years, from redshift z=127 to the present day, on supercomputers at the Max Planck Society's facility in Garching, Germany. This effort, involving over 343,000 CPU-hours, revealed hierarchical structure growth and baryon acoustic oscillations, providing constraints on dark energy and linking galaxy properties to cosmic clustering.^[86]^[87] Bioinformatics leverages HPC for processing vast genomic datasets and predicting biomolecular structures, accelerating discoveries in biology and medicine. Since 2020, AlphaFold, developed by DeepMind, has utilized powerful computing clusters to predict the 3D structures of over 200 million proteins with near-atomic accuracy, transforming protein folding research that previously required years of experimentation. These predictions, generated in minutes per structure, support genome sequencing analyses by elucidating protein functions and interactions, with applications in drug design and disease modeling; the AlphaFold Protein Structure Database now aids over 2 million researchers worldwide. As of 2024, AlphaFold 3 extends these capabilities to predict interactions with DNA, RNA, and ligands.^[88] Particle physics relies on HPC for lattice quantum chromodynamics (QCD) calculations, which simulate quark-gluon interactions to probe the strong force underlying atomic nuclei. Lattice QCD formulations discretize spacetime to compute non-perturbative effects, essential for interpreting data from experiments like the Large Hadron Collider (LHC) at CERN. These simulations, performed on exascale prototypes, validate Standard Model predictions for hadron masses and scattering processes, with algorithmic advances achieving over 50-fold speedups since 2016 to handle physical volumes approaching continuum limits.^[89] Despite these advances, HPC in scientific simulations faces significant challenges, particularly in data visualization and uncertainty quantification (UQ). Visualization tools must render petabyte-scale outputs from multiphysics models into interpretable forms, often employing in-situ techniques to process data during simulation and avoid I/O bottlenecks. UQ, crucial for assessing model reliability in climate and particle physics, involves propagating uncertainties through high-dimensional parameter spaces, requiring efficient parallel algorithms on exascale systems to generate statistically robust ensembles without prohibitive computational costs.^[90]

Engineering and Commercial Uses

High-performance computing (HPC) plays a pivotal role in engineering and commercial sectors by enabling complex simulations and optimizations that drive innovation and cost efficiencies in profit-oriented applications. In industries such as aerospace, automotive, finance, oil and gas, and pharmaceuticals, HPC systems process vast datasets to model real-world scenarios, accelerating design cycles and risk management while minimizing physical prototyping expenses. In aerospace engineering, HPC is essential for computational fluid dynamics (CFD) simulations that predict aircraft aerodynamics and structural integrity during design phases. For instance, Boeing utilizes HPC clusters to perform detailed CFD analyses for the 777X aircraft, simulating airflow over wings and fuselages to optimize fuel efficiency, achieving up to 20% improvement in fuel use and emissions compared to predecessors through advanced wing designs that reduce drag. These simulations, running on thousands of cores, allow engineers to iterate designs virtually, cutting development time from years to months and avoiding costly wind tunnel tests.^[91] The automotive industry leverages HPC for crash simulations and training machine learning models for autonomous vehicles, enhancing safety and performance predictions. Carmakers like Ford and General Motors employ finite element analysis on supercomputers to model vehicle collisions, evaluating material behaviors under extreme forces to refine crumple zones and occupant protection systems. A prominent example is Tesla's Dojo supercomputer (operational 2023–2025), which processed petabytes of video data from its vehicle fleet to train neural networks for self-driving capabilities, achieving real-time inference speeds necessary for production deployment before its discontinuation in 2025 to focus on newer AI infrastructure. In finance, HPC facilitates Monte Carlo simulations for risk assessment and portfolio optimization, handling billions of probabilistic scenarios to forecast market volatilities and comply with regulatory requirements. Major banks such as JPMorgan Chase run these simulations on dedicated HPC infrastructure to value complex derivatives and stress-test assets against economic shocks, thereby improving capital allocation decisions. This capability has become standard since the 2008 financial crisis, with HPC reducing computation times from days to hours for value-at-risk calculations. Oil and gas exploration relies on HPC for seismic imaging and reservoir modeling, which interpret underground structures to locate hydrocarbon deposits efficiently. Companies like ExxonMobil use reverse time migration algorithms on supercomputers to process terabytes of seismic data, generating high-resolution 3D images that identify viable drilling sites with greater accuracy. Such applications have reduced exploration costs by 30-50% through fewer dry wells and optimized recovery strategies, as evidenced by case studies from the 2010s onward.^[92] Pharmaceutical drug discovery employs HPC for molecular dynamics simulations that screen potential compounds by modeling protein-ligand interactions at atomic scales. Firms like Pfizer accelerate lead identification using GPU-accelerated HPC systems, simulating billions of molecular configurations to predict binding affinities and efficacy, a process that has shortened discovery timelines from 5-7 years to under 3 years in some pipelines since the 2010s. This approach, powered by tools like GROMACS on clusters, has contributed to breakthroughs in targeted therapies by enabling virtual high-throughput screening of vast chemical libraries.

Modern Implementations

HPC in the Cloud

High-performance computing (HPC) in the cloud refers to the provision of scalable computational resources over the internet, allowing users to access powerful clusters without owning physical infrastructure. This model has gained prominence as cloud providers integrate HPC capabilities into their platforms, enabling rapid deployment of parallel processing for complex simulations and data analysis. Since the 2010s, major expansions in cloud infrastructure have made HPC accessible to a broader range of organizations, shifting from traditional on-premises systems to virtualized environments that support bursty workloads and global collaboration.^[93] Key providers dominate the cloud HPC landscape. Amazon Web Services (AWS) offers HPC clusters through services like AWS Batch and ParallelCluster, which automate the setup of high-throughput computing environments for tasks such as molecular dynamics and weather modeling.^[94] Microsoft Azure provides Azure Batch, a managed service for running large-scale parallel and HPC jobs, supporting containerized workloads and integration with on-premises systems.^[95] Google Cloud delivers the Cluster Toolkit (formerly Cloud HPC Toolkit), an open-source tool launched in 2022 to simplify the deployment of turnkey HPC clusters using best practices for networking and storage.^[96] These offerings, building on infrastructure expansions since the early 2010s, have democratized access to exascale-level performance for non-specialist users.^[97] Cloud HPC provides several advantages over dedicated hardware. On-demand scaling allows resources to expand or contract dynamically, accommodating variable workloads without overprovisioning.^[98] The pay-as-you-go pricing model eliminates upfront investments, converting capital expenditures (CapEx) to operational expenses (OpEx) and potentially reducing costs by up to 90% through spot instances for non-critical jobs.^[99] Global accessibility further enhances collaboration, as teams can access compute from any location with internet connectivity, fostering faster iteration in research and development.^[100] Despite these benefits, cloud HPC faces notable challenges. While early cloud networks were limited to 100 Gbps Ethernet, as of 2025, providers offer up to 400 Gbps with low-latency options like RDMA, though still potentially higher delays compared to on-premises InfiniBand interconnects, which achieve sub-microsecond latencies essential for tightly coupled simulations.^[100]^[101]^[102] Security concerns also arise, particularly for sensitive simulations involving proprietary or classified data, requiring robust encryption, compliance with standards like NIST SP 800-223, and multi-factor authentication to mitigate risks of breaches in shared environments.^[103] Hybrid models address some limitations by combining on-premises and cloud resources. Cloud bursting enables organizations to handle peak loads by seamlessly extending local clusters to the cloud, maintaining low-latency operations for core tasks while offloading surges.^[104] For instance, Azure supports hybrid bursting through integration with existing HPC systems, allowing automatic resource allocation during high-demand periods.^[95] A prominent case study illustrates these capabilities: NASA's Center for Climate Simulation utilized AWS through the Science Managed Cloud Environment (SMCE) for large-scale climate modeling and data analytics, leveraging cloud bursting to accelerate processing for Earth science projects. This approach reduced project durations and achieved substantial cost efficiencies, aligning with broader reports of up to 90% savings in HPC workloads via optimized cloud usage.^[105]^[99]

Leading Supercomputers

As of the June 2025 TOP500 ranking (latest available as of November 2025), the world's leading supercomputers are primarily exascale systems capable of over one exaFLOPS of performance on the High-Performance LINPACK benchmark, marking a significant advancement in computational scale for scientific simulations and AI-driven research.^[106] The top-ranked system, El Capitan at Lawrence Livermore National Laboratory in the United States, delivers 1.742 exaFLOPS using an HPE Cray EX255a architecture with AMD 4th Gen EPYC CPUs and MI300A accelerators interconnected via Slingshot-11, enabling breakthroughs in stockpile stewardship and real-world scientific applications like climate modeling.^[106]^[68] In second place, Frontier at Oak Ridge National Laboratory achieves 1.353 exaFLOPS with HPE Cray EX235a nodes featuring AMD 3rd Gen EPYC processors and MI250X GPUs over Slingshot-11, supporting key research in fusion energy, materials science, biosciences, and astrophysics.^[106]^[107] Aurora, third on the list at Argonne National Laboratory, provides 1.012 exaFLOPS through an HPE Cray EX system with Intel Xeon CPU Max 9470 and Data Center GPU Max accelerators connected by Slingshot-11, focusing on fusion simulations, climate analysis, and AI-for-science initiatives such as brain mapping and aircraft design optimization.^[106]^[108] Other prominent systems include Fugaku at RIKEN in Japan, which sustains 442 petaFLOPS using Fujitsu's A64FX ARM-based processors and Tofu-D interconnect, continuing to excel in broad scientific computations including drug discovery modeling and graph analytics despite its 2020 debut.^[106]^[109] LUMI in Finland, ranking ninth at 379.7 petaFLOPS, employs HPE Cray EX235a with AMD 3rd Gen EPYC CPUs and MI250X GPUs, powering European climate science, weather forecasting, ocean simulations, and renewable energy research while running on hydroelectric power for sustainability.^[106]

Rank	System	Site (Country)	Rmax (PFlop/s)	Architecture Key Components	Power Efficiency (GFlop/s/W)
1	El Capitan	LLNL (USA)	1,742,000	AMD 4th Gen EPYC + MI300A, Slingshot-11	60.3
2	Frontier	ORNL (USA)	1,353,000	AMD 3rd Gen EPYC + MI250X GPUs, Slingshot-11	55.0
3	Aurora	ANL (USA)	1,012,000	Intel Xeon CPU Max 9470 + Data Center GPU Max, Slingshot-11	26.2
7	Fugaku	RIKEN (Japan)	442,010	Fujitsu A64FX ARM CPUs, Tofu-D	14.8
9	LUMI	CSC (Finland)	379,700	AMD 3rd Gen EPYC + MI250X GPUs, Slingshot-11	53.4

These leading systems exemplify a shift toward heterogeneous architectures integrating CPUs with specialized accelerators like GPUs for enhanced parallel processing efficiency.^[106] Power consumption for such machines typically ranges from 20 to 30 megawatts, balancing exascale performance with energy constraints through advanced cooling and interconnect technologies.^[110]^[111]

Emerging Trends

The U.S. Exascale Computing Project, a collaborative initiative involving the Department of Energy and industry partners, targets the deployment of systems exceeding 2 exaFLOPS of performance to address complex scientific simulations, with key milestones like the Aurora supercomputer at Argonne National Laboratory, which reached 1.012 exaFLOPS upon full operation in 2025.^[112]^[113] Beyond exascale, visions for zettascale computing—representing a 1,000-fold increase in performance—emerge as the next frontier, with early roadmaps such as Intel's 2021 vision aiming for this scale by 2027 through advanced processor architectures and scalable interconnects, while international efforts like Japan's FugakuNEXT aim for similar targets by 2030.^[112]^[114]^[115] These developments emphasize hybrid architectures combining traditional CPUs with specialized accelerators to overcome current scaling challenges in power and data movement.^[114] Integration of artificial intelligence and machine learning into high-performance computing is advancing through neuromorphic systems, which emulate neural structures for energy-efficient processing of hybrid workloads that blend traditional simulations with AI-driven inference.^[116]^[117] These systems promise real-time performance and reduced power demands compared to conventional HPC resources, enabling applications like adaptive climate modeling.^[118] Complementing this, tensor processing units (TPUs) from Google facilitate accelerated execution of AI tasks within broader HPC environments, scaling cost-efficiently for training and inference in cloud-based hybrid setups.^[119]^[120] Such integrations are pivotal for handling the exponential growth in data-intensive computations, with frameworks like AI-coupled workflows optimizing resource allocation across HPC clusters.^[120] Hybrid quantum-high-performance computing systems represent an early fusion of quantum processors with classical infrastructure, particularly for tackling optimization problems intractable on traditional hardware. IBM's quantum-centric supercomputing approach combines quantum bits for exploratory searches with HPC for refinement, demonstrating advantages in areas like supply chain logistics and molecular design.^[121]^[122] These systems leverage software like Qiskit to interface quantum and classical components seamlessly, enabling scalable solutions for complex combinatorial challenges.^[123]^[124] Sustainability efforts in high-performance computing focus on carbon-neutral designs, with facilities like RWTH Aachen's CLAIX-2023 system powered exclusively by renewable energy sources to minimize emissions during operation.^[125] Companies such as HPC AG have achieved full climate neutrality by compensating for all greenhouse gases and integrating green power, setting precedents for industry-wide adoption.^[126] Photonic interconnects further enhance efficiency by replacing electrical links with light-based transmission, potentially reducing energy consumption by up to 50% in data movement-heavy workloads while maintaining high bandwidth.^[127]^[128] Edge high-performance computing extends distributed processing to Internet of Things ecosystems, enabling real-time analytics at the network periphery for applications in 5G and prospective 6G environments.^[129] This paradigm supports low-latency computations on resource-constrained devices, such as predictive maintenance in smart cities, by offloading intensive tasks from central clouds to edge nodes.^[130]^[131] With 6G visions emphasizing terahertz communications, edge HPC will facilitate hyper-distributed architectures for seamless IoT data fusion and decision-making.^[129]

References

[1]
What is High Performance Computing? | U.S. Geological Survey
High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance.
[2]
[PDF] Introduction to High Performance Computing - Boston University
• High Performance Computing (HPC) refers to the practice of aggregating computing power in order to solve large problems in science, engineering, or business. ...Missing: definition | Show results with:definition
[3]
[PDF] History and overview of high performance computing
design began in 1966; goal was 1 GFLOP/s and estimated cost was. $8 million. “finished” in 1971-1972 at a cost of $31 million and a top speed well.
[4]
Why HPC? - Purdue's RCAC
High-Performance Computing (HPC) refers to the use of powerful computers and advanced software tools to perform complex calculations and simulations that ...Missing: definition | Show results with:definition
[5]
High Performance Computing | NIST
Oct 18, 2010 · Our High Performance Computing (HPC) program enables work on challenging problems that are beyond the capacity and capability of desktop computing resources.Missing: definition | Show results with:definition
[6]
Supercomputing - Department of Energy
Supercomputing - also known as high-performance computing - is the use of powerful resources that consist of multiple computer systems working in parallel ...
[7]
What Is High-Performance Computing (HPC)? - IBM
HPC is a technology that uses clusters of powerful processors that work in parallel to process massive, multidimensional data sets and solve complex problems ...Missing: authoritative | Show results with:authoritative
[8]
High-Throughput Computing (HTC) and its Requirements
In contrast, High Performance Computing (HPC) environments deliver a tremendous amount of compute power over a short period of time. HPC environments are often ...
[9]
High‐Throughput Computing Versus High‐Performance Computing ...
Jan 28, 2015 · HTC differs from high-performance computing (HPC), where rapid interaction of intermediate results is required to perform the computations.
[10]
The Evolution of HPC | Inside HPC & AI News
Aug 24, 2016 · The name supercomputer fell out of favor and the term “HPC systems” was used to described high performance clusters. Individual servers were ...
[11]
Understanding Exascale
While these petascale systems are quite powerful, the next milestone in computing achievement is the exascale—a higher level of performance in computing that ...
[12]
Exascale Computing Takes Research to the Next Level
Jul 7, 2023 · Before exascale, the fastest supercomputers in the world could handle problems at the petascale, or one quadrillion operations each second.
[13]
9.3. Parallel Design Patterns — Computer Systems Fundamentals
Task parallelism refers to decomposing the problem into multiple sub-tasks, all of which can be separated and run in parallel. Data parallelism, on the other ...
[14]
[PDF] A Primer on Parallel Programming - Princeton Research Computing
Oct 18, 2021 · Differentiate parallel paradigms by how memory and communication is managed. Shared Memory. Distributed Memory. Hybrid, Distributed-Shared- ...
[15]
Introduction to Parallel Computing Tutorial - | HPC @ LLNL
Amdahl's Law states that potential program speedup is defined by the fraction of code (P) that can be parallelized: 1 speedup = -------- 1 - P. If none of ...
[16]
[PDF] Validity of the Single Processor Approach to Achieving Large Scale ...
Amdahl. TECHNICAL LITERATURE. This article was the first publica- tion by Gene Amdahl on what became known as Amdahl's Law. Interestingly, it has no equations.
[17]
Reevaluating Amdahl's Law and Gustafson's Law - Temple CIS
Amdahl's Law and Gustafson's Law are used to estimate speedups in parallel processing. They are mathematically equivalent, with only two different formulations.
[18]
(PDF) Gustafson's Law - ResearchGate
In a 1967 conference debate over the merits of parallel computing, IBM's Gene Amdahl argued · He asserted that this would sharply limit the approach of parallel ...Missing: original | Show results with:original
[19]
Memory, Cache, Interconnects - Cornell Virtual Workshop
Hierarchical memory—from RAM on multiple nodes, through successive levels of cache—is used to feed data down to a single core's hardware registers for ...
[20]
17.1. Intro to Parallel Computing
Shared vs. Distributed Memory#. Shared Memory: In a shared memory model, multiple processors access the same memory space. This allows for efficient ...
[21]
Glossary & Terminology - HPC@UMD
Distributed memory parallelism is a paradigm for parallel computing in which the parallel processes do not all share a global memory address space. Because of ...<|separator|>
[22]
DOE Explains...Exascale Computing - Department of Energy
One way scientists measure computer performance is in floating point operations per second (FLOPS). These involve simple arithmetic like addition and ...
[23]
The world's first general purpose computer turns 75 | Penn Today
Feb 11, 2021 · Designed by John Mauchly and J. Presper Eckert, ENIAC was the fastest computational device of its time, able to do 5,000 additions per second, ...
[24]
5 Lessons From History | Funding a Revolution: Government Support for Computing Research | The National Academies Press
### Summary of Early Government Funding for High-Performance Computing (1940s-1970s)
[25]
[PDF] ENIAC: The “first” electronic computer
18,000 vacuum tubes, 70, 000 resistors, etc.. It also generated lots of heat and the cooling systems weighted a couple of tons. /. The following machine ...
[26]
https://penntoday.upenn.edu/news/worlds-first-general-purpose-computer-turns-75
[27]
[PDF] L.1 Introduction L-2 L.2 The Early Development of Computers ...
In 1964, Control Data delivered the first supercomputer, the CDC 6600. As. Thornton [1964] discussed, he, Cray, and the other 6600 designers were among the ...
[28]
[PDF] COMPUTER ARCHITECTURE TECHNIQUES FOR POWER ...
Power dissipation issues have catalyzed new topic areas in computer architecture, resulting in a substantial body of work on more power-efficient architectures.
[29]
[PDF] ILLIAC IV
ILLIAC IV is a milestone in computer development in that it provides a level of parallel processing many times that of conventional designs. To achieve this ...
[30]
[PDF] LIBRARY80Y - NASA Technical Reports Server (NTRS)
to Honeywell back to Burroughs and funding agencies for ILLIAC IV went from the Advanced Research Project Agency (ARPA)to ARPAplus other government agencies ...
[31]
About | TOP500
The TOP500 project was launched in 1993 to improve and renew the Mannheim supercomputer statistics, which had been in use for seven years.Missing: history | Show results with:history
[32]
[PDF] New Day Dawns in Supercomputing
The goal of the Accelerated Strategic Computing Initiative. (ASCI) is to provide the numerical simulation capability needed to model the safety, reliability, ...
[33]
[PDF] The Green500 List: Year Two - TOP500
More specifically, from November 2007 to November 2009, the maxi- mum energy efficiency increased by 102% from 357. MFLOPS/W to 723 MFLOPS/W while the average ...
[34]
[PDF] The Green500 List: Encouraging Sustainable Supercomputing
Dec 2, 2007 · Beginning with the November. 2007 Green500 List, we'll use metered measurements in rankings whenever available. As the list matures, we.
[35]
Breaking the petaflop barrier - IBM
Roadrunner broke the petaflop barrier on May 25, 2008, but not without some last-minute drama. The day's first hurdle was simply to ensure the machine could ...Missing: 1.7 | Show results with:1.7
[36]
Roadrunner Supercomputer Breaks the Petaflop Barrier - OSTI
Jun 9, 2008 · At 3:30 a.m. on May 26, 2008, Memorial Day, the "Roadrunner" supercomputer exceeded a sustained speed of 1 petaflop/s, or 1 million billion ...Missing: 1.7 | Show results with:1.7
[37]
Japan's Fugaku gains title as world's fastest supercomputer | RIKEN
Jun 23, 2020 · The supercomputer Fugaku, which is being developed jointly by RIKEN and Fujitsu Limited based on Arm® technology, has taken the top spot on the Top500 list.
[38]
AI for Science Platform Division | RIKEN Center for Computational ...
This division's responsibilities extend to the integration of the 'Fugaku' supercomputer, with a novel AI-dedicated supercomputing system. This integration ...
[39]
At the Frontier: DOE Supercomputing Launches the Exascale Era
Jun 7, 2022 · Frontier broke the exascale limit, reaching 1.1 exaflops of performance on the High-Performance Linpack benchmark. Exascale performance is ...
[40]
Frontier supercomputer hits new highs in third year of exascale | ORNL
Nov 18, 2024 · The Frontier team achieved a High-Performance Linpack, or HPL, score of 1.35 exaflops, or 1.35 quintillion calculations per second using double- ...
[41]
High Performance Computing (HPC) Architecture - Intel
Oftentimes, these components include a CPU and an accelerator such as an FPGA or GPU, plus memory, storage, and networking components. Nodes or servers of ...
[42]
5th Gen AMD EPYC™ Processors Elevate HPC and AI Workloads to ...
Oct 14, 2024 · AMD EPYC 9005 Series Processors offer from 8 to 192 cores and TDPs ranging from 155 to 500W. I've selected the 5th Gen 64-core high frequency ...
[43]
[PDF] NVIDIA A100 | Tensor Core GPU
The A100 is a powerful GPU with 80GB memory, 19.5 TFLOPS FP32, 9.7 TFLOPS FP64, 2TB/s memory bandwidth, and 20x higher performance than Volta. It can scale up ...
[44]
High Performance Computing Using FPGAs (WP375)
Advancements in silicon, software, and IP have proven Xilinx FPGAs to be the ideal solution for accelerating applications on high-performance embedded ...
[45]
NVIDIA ConnectX InfiniBand Adapters
The ConnectX-7 smart host channel adapter (HCA) provides ultra-low latency, 400Gb/s throughput, and innovative NVIDIA In-Network Computing acceleration engines ...
[46]
NVIDIA Quantum-2 InfiniBand Platform
NVIDIA Quantum-2 empowers the world's leading supercomputing data centers with software-defined networking, In-Network Computing, performance isolation.
[47]
Performance evaluation of High Bandwidth Memory for HPC ...
In this paper, we study the performance of latency and energy of one such 3D stacked memory, namely the High Bandwidth Memory (HBM).
[48]
[PDF] What Modern NVMe Storage Can Do, And How To Exploit It
NVMe SSDs based on flash are cheap and offer high throughput. Combining several of these devices into a single server enables. 10 million I/O operations per ...
[49]
Chapter 20: Thermal - IEEE Electronics Packaging Society
Jun 19, 2019 · However, these conflicting trends have resulted in a substantial increase in both heat flux >350 W/cm² and power density, which reduced the ...
[50]
Thermal Performance Characteristic of Single-phase Immersion ...
Therefore, liquid cooling has more attention to address the thermal management challenges from increasing TDP of chips and rack power density in DCs. Among ...
[51]
Computer Clusters and MPP Architectures - BrainKart
Mar 24, 2017 · In this section, we will start by discussing basic, small-scale PC or server clusters. We will discuss how to construct large-scale clusters and MPPs in ...
[52]
[PDF] A Message-Passing Interface Standard - MPI Forum
Nov 2, 2023 · This document describes the Message-Passing Interface (MPI) standard, version 4.1. The MPI standard includes point-to-point message-passing, ...
[53]
Specifications - OpenMP
Sep 15, 2025 · The OpenMP API supports multi-platform shared-memory parallel programming in C/C++ and Fortran. The OpenMP API defines a portable, scalable model.
[54]
BLAS (Basic Linear Algebra Subprograms) - The Netlib
The BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for performing basic vector and matrix operations.BLAS Technical Forum · FAQ · Blas/gemm_based · BLAS(Legacy Website)
[55]
LAPACK — Linear Algebra PACKage - The Netlib
LAPACK is a software package providing routines for solving linear equations, least-squares, eigenvalue, and singular value problems. It is freely available.Lapack · LAPACK Users' Guide -- Third... · Lapack 3.5.0 · LAPACK95 -- Fortran95...
[56]
CUDA C++ Programming Guide
The programming guide to the CUDA model and interface.
[57]
Overview - Slurm Workload Manager - SchedMD
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
[58]
OpenPBS Open Source Project
OpenPBS software optimizes job scheduling and workload management in high-performance computing (HPC) environments - clusters, clouds, and supercomputers - ...
[59]
Rocks Cluster
Rocks is an open-source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization ...Downloads · Docs and Support · Register Your Cluster · Archive
[60]
Valgrind Home
Official Home Page for valgrind, a suite of tools for debugging and profiling. Automatically detect memory management and threading bugs, ...Tool Suite · Current Releases · The Valgrind Quick Start Guide · Code Repository
[61]
TAU - Tuning and Analysis Utilities - - Computer Science
TAU Performance System® is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, UPC, Java, Python.
[62]
[PDF] The TOP500 Project
Jan 20, 2008 · The TOP500 project was launched in 1993 to provide a reliable basis for tracking and detecting trends in high ... 1st TOP500 List, 06/1993 30th ...
[63]
TOP500 Founder Erich Strohmaier on the List's Evolution
The TOP500 list of the world's fastest supercomputers first debuted more than two decades ago, in June 1993, the brainchild of Berkeley Lab scientist Erich ...Missing: origins | Show results with:origins
[64]
List Statistics | TOP500
List Statistics: R max and R peak values are in GFlops. For more details about other fields, check the TOP500 description.Missing: history | Show results with:history<|separator|>
[65]
HPC Power Efficiency and the Green500 - HPCwire
Nov 20, 2013 · The first Green500 List was launched in November 2007 ranking the energy efficiency of supercomputers. Co-founder Kirk W. Cameron discusses ...
[66]
El Capitan: NNSA's first exascale machine
While it is one of the world's most energy-efficient supercomputers, El Capitan requires about 30 megawatts (MW) of energy to run at peak—enough power to run a ...
[67]
HPCG Benchmark
- **HPCG Benchmark**: High Performance Conjugate Gradients, a metric to rank HPC systems, complementing High Performance LINPACK (HPL).
[68]
Green500 - TOP500
The 25th Green500 List was published June 14, 2025 in Hamburg. November 2024. The 24th Green500 List was published Nov. 19, 2024 in Atlanta, GA. June 2024.June 2024 · June 2025 · November 2024 · November 2020
[69]
JUPITER Sets New Energy Efficiency Standards with #1 Ranking on ...
Oct 4, 2025 · The first module of the exascale supercomputer JUPITER is ranked first place in the Green500 list of the most energy-efficient supercomputers
[70]
[PDF] High-performance conjugate-gradient benchmark: A new metric for ...
The high- performance conjugate-gradient (HPCG) benchmark is used to test a high-performance conjugate (HPC) machine's ability to solve these important ...
[71]
https://top500.org/lists/green500/
[72]
hpc/ior: IOR and mdtest - GitHub
This repository contains the IOR and mdtest parallel I/O benchmarks. The official IOR/mdtest documentation can be found in the docs/ subdirectory or on Read ...Missing: throughput | Show results with:throughput
[73]
IOR - Lustre Wiki
Oct 13, 2025 · IOR (Interleaved or Random) is a commonly used file system benchmarking application particularly well-suited for evaluating the performance of parallel file ...<|separator|>
[74]
NAS Parallel Benchmarks - NASA Advanced Supercomputing Division
Jun 18, 2024 · The NAS Parallel Benchmarks (NPB) are programs to evaluate parallel supercomputer performance, derived from CFD applications, with five kernels ...
[75]
Graph 500 | large-scale benchmarks
Data intensive supercomputer applications are increasingly important for HPC ... Graph 500 will establish a set of large-scale benchmarks for these applications.Green Graph500 · Benchmark Specification · Complete Results · SubmissionsMissing: big | Show results with:big
[76]
[PDF] Accelerating Time-to-Solution for Computational Science and ...
To minimize the time-to-solution of a computational science or engineering problem, the time to write and also run the program must both be considered.
[77]
[PDF] HPC Cloud for Scientific and Business Applications - arXiv
FLOPS per-dollar is an important metric. From ... Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on Amazon EC2.
[78]
[PDF] EXASCALE REQUIREMENTS REVIEW - Argonne Blogs
... scale climate and Earth system modeling; the interdependence of climate effects and ecosystems; and integrated analysis of climate impacts on energy and related.
[79]
Simulations of the formation, evolution and clustering of galaxies and quasars - Nature
### Summary of Millennium Simulation Details
[80]
Millennium Simulation
### Millennium Simulation Summary
[81]
AlphaFold - Google DeepMind
In 2020, AlphaFold solved this problem, with the ability to predict protein structures in minutes, to a remarkable degree of accuracy. That's helping ...Missing: HPC clusters
[82]
LatticeQCD - Exascale Computing Project
The LatticeQCD project is implementing scalable QCD algorithms to realistically simulate the atomic nucleus to reveal a deeper understanding of the fundamental ...
[83]
HPC, Simulation and Data Science | Lawrence Livermore National ...
... (HPC), simulation and data science, impact an array of mission challenges ... modeling, statistical inference, uncertainty quantification and more. Learn ...
[84]
How the Cloud Has Evolved Over the Past 10 Years - Dataversity
Apr 6, 2021 · By 2010, Amazon, Google, Microsoft, and OpenStack had all launched cloud divisions. This helped to make cloud services available to the masses.
[85]
High Performance Computing (HPC) - Amazon AWS
Using AWS, expedite your high performance computing (HPC) workloads & save money by choosing from low-cost pricing models that match utilization needs.Getting Started · Resources · AWS Batch · AWS HPC Customer Success...Missing: Azure Google history
[86]
High-Performance Computing (HPC) on Azure - Microsoft Learn
Dec 12, 2024 · Azure Batch is a platform service for running large-scale parallel and HPC applications efficiently in the cloud. Azure Batch schedules compute ...Missing: AWS Google history
[87]
Google Cloud Introduces the Cloud HPC Toolkit
May 27, 2022 · An open source tool that enables users to easily create repeatable, turnkey HPC clusters based on proven best practices.Missing: AWS Azure Batch history
[88]
The Cloud Wars: AWS vs. Microsoft Azure vs. Google Cloud - History ...
Oct 2, 2024 · Explore the ongoing Cloud Wars between industry giants—AWS, Microsoft Azure, and Google Cloud. Learn about the history, key players, ...
[89]
What is HPC or High-Performance Computing in the Cloud
Dec 19, 2023 · With cloud HPC, you can spin up and down computing resources in minutes, letting you test different algorithms and hyperparameters quickly. A ...
[90]
Dispelling Cloud Cost Myths: Why Cloud Wins for Research and AI
Sep 16, 2025 · Cloud cost myths debunked: Research teams save up to 90% on HPC and GPU computing with AWS and Azure spot instances.
[91]
High-Performance Computing in the Cloud: Opportunities and ...
Oct 3, 2025 · Many cloud services, such as AWS Batch or Azure Batch, natively support containerized workloads, streamlining the execution of complex HPC ...Missing: Toolkit | Show results with:Toolkit
[92]
What are the latency differences between InfiniBand and 100GbE ...
Sub-microsecond latency: InfiniBand typically achieves end-to-end latencies in the range of 0.7 to 1.5 microseconds, depending on configuration and hardware.
[93]
[PDF] High-Performance Computing (HPC) Security: Architecture, Threat ...
Feb 2, 2024 · This document covers High-Performance Computing Security, including architecture, threat analysis, and security posture.
[94]
Cloud bursting for research computing - AWS Prescriptive Guidance
The cloud addresses these challenges with hybrid compute and storage solutions that let you burst research computing into the cloud when on-premises capacity ...
[95]
Cloud Computing | NASA Center for Climate Simulation
The Science Managed Cloud Environment (SMCE) is a managed Amazon Web Service (AWS) based infrastructure for NASA funded projects that can leverage cloud ...Missing: 2023 | Show results with:2023
[96]
TOP500 List - June 2025
The top 3 systems are El Capitan (1,742.00 PFlop/s), Frontier (1,353.00 PFlop/s), and Aurora (1,012.00 PFlop/s).
[97]
Frontier - Oak Ridge Leadership Computing Facility
Frontier is an exascale supercomputer, the first of its kind, achieving a quintillion calculations per second, and is the world's fastest on the 59th TOP500 ...
[98]
Aurora Exascale Supercomputer - Argonne National Laboratory
U.S. Department of Energy's INCITE program seeks proposals for 2025 to advance science and engineering at U.S. leadership computing facilities. The INCITE ...Aurora · Aurora by the Numbers · Argonne’s Aurora... · Aurora Early Science
[99]
Supercomputer Fugaku : Fujitsu Global
The supercomputer Fugaku, jointly developed by RIKEN and Fujitsu, has successfully retained the top spot for 11 consecutive terms in the Graph500 BFS.Missing: 2020 | Show results with:2020
[100]
Highlights - June 2025 - TOP500
The systems of the TOP500 are ranked by how much computational performance they deliver on the HPL benchmark per Watt of electrical power consumed.
[101]
Figure 9 Annual energy usage by the top 10 supercomputers ... - WIPO
Figure 9 Annual energy usage by the top 10 supercomputers ... Aurora 254.25 GWh/y Supercomputer Fugaku 196.44 GWh/y El Capitan 194.35 GWh/y Frontier 161.67 GWh/y ...
[102]
Intel Aims For Zettaflops By 2027, Pushes Aurora Above 2 Exaflops
Oct 27, 2021 · This is slated to take more than 9,000 nodes, which means more than 18,000 Sapphire Rapids CPUs and more than 54,000 Ponte Vecchio GPU ...
[103]
Exascale Computing | PNNL
The computer is being developed at Argonne National Laboratory and at its peak will have a performance of more than 2 exaFLOPS. It will be used for a range of ...<|separator|>
[104]
Forget Zettascale, Trouble is Brewing in Scaling Exascale ... - HPCwire
Nov 14, 2023 · In 2021, Intel famously declared its goal to get to zettascale supercomputing by 2027, or scaling today's Exascale computers by 1,000 times.
[105]
Japan Unveils Plans for Zettascale Supercomputer: 100 PFLOPs of ...
Aug 28, 2024 · With a target completion date of 2030, the new supercomputer aims to surpass current technological boundaries, potentially becoming the world's ...<|separator|>
[106]
Are neuromorphic systems the future of high-performance computing?
Mar 8, 2022 · Scientists and engineers are developing computing technologies that mimic how neurons operate in the brain. This is not just about building faster computers.
[107]
The road to commercial success for neuromorphic technologies
Apr 15, 2025 · DC placement implies interfacing of NM compute with standard high-performance computing (HPC) busses, and with HPC scheduling systems.
[108]
Neuromorphic Computing - Human Brain Project
Compared to traditional HPC resources, the Neuromorphic systems potentially offer higher speed (real-time or accelerated) and lower energy consumption. The ...
[109]
Tensor Processing Units (TPUs) - Google Cloud
Cloud TPUs are designed to scale cost-efficiently for a wide range of AI workloads, spanning training, fine-tuning, and inference. Cloud TPUs provide the ...Missing: HPC- | Show results with:HPC-
[110]
AI-coupled HPC Workflow Applications, Middleware and Performance
Jun 20, 2024 · This paper surveys the diverse and rapidly evolving field of AI-driven HPC and provides a common conceptual basis for understanding AI-driven HPC workflows.
[111]
What is Quantum-Centric Supercomputing? - IBM
Quantum-centric supercomputing is a revolutionary approach to computer science that combines quantum computing with traditional high-performance computing (HPC)
[112]
Building software for quantum-centric supercomputing - IBM
Sep 15, 2025 · IBM has spent years laying the groundwork for a future where quantum and classical high-performance computing (HPC) systems work together to ...
[113]
Interfacing Quantum Computing Systems with High-Performance ...
Sep 7, 2025 · Section 5 illustrates practical application domains where hybrid HPC-QC systems have shown significant promise, specifically optimization ...
[114]
A Hybrid Approach for Solving Optimization Problems on Small ...
Jun 18, 2019 · A hybrid quantum and classical approach may be the answer to tackling this problem with existing quantum hardware. Approach. A team of ...
[115]
Green HPC: Sustainability in the Data Centers of Tomorrow
Apr 16, 2025 · This not only saves electricity, but also significantly reduces CO2 emissions. In addition, CLAIX-2023 is powered exclusively by renewable ...
[116]
This is why we are a climate-neutral company - HPC AG
The greenhouse gases caused by HPC AG are recorded and compensated. This makes us one of the first companies in our industry to voluntarily compensate for ...
[117]
Photonic Interconnects - an overview | ScienceDirect Topics
Interconnect will be the bottleneck as it will consume 80% of the power, 40% of the system cost, and will be responsible for 50–90% of the performance: ...
[118]
Photonics for sustainable AI | Communications Physics - Nature
Oct 14, 2025 · This high energy efficiency of photonic computing helps to achieve a low OCF as well as high throughput in photonic systems. Photonic chips ...
[119]
[PDF] Edge Computing in IoT: A 6G Perspective - arXiv
May 15, 2022 · Abstract—Edge computing is one of the key driving forces to enable Beyond 5G (B5G) and 6G networks. Due to the unprecedented increase in ...
[120]
Edge Computing in the Internet of Things: A 6G Perspective
This article presents key considerations for edge deployments in B5G/6G networks, including edge architecture, server location, capacity, user density, ...
[121]
6G—Enabling the New Smart City: A Survey - PMC - PubMed Central
For example, real-time video analytics can be performed at the edge of the network using edge devices, with the processed data transmitted over 5G networks.4. The Role Of 5g In Smart... · 5. 6g--The Emerging New... · 7.1. Terahertz...