Fact-checked by Grok 2 weeks ago

K computer

The K computer (京, Kei), developed jointly by Japan's RIKEN research institute and Fujitsu, is a scalar-type supercomputer that achieved a sustained performance of 10.51 petaFLOPS on the LINPACK benchmark, making it the world's fastest supercomputer from June 2011 to June 2012. Installed at the RIKEN Advanced Institute for Computational Science (AICS) in Kobe, Hyōgo Prefecture, it consists of 88,128 compute nodes, each equipped with a single SPARC64 VIIIfx eight-core processor running at 2.0 GHz, for a total of 705,024 cores and a theoretical peak performance of 11.28 petaFLOPS. The system's Tofu interconnect employs a six-dimensional mesh/torus topology to enable high-bandwidth, low-latency communication among nodes, while its innovative water-cooling system dissipates heat from the processors and power supplies, contributing to an overall power consumption of 12.66 megawatts during peak operation. Funded under Japan's Ministry of Education, Culture, Sports, Science and Technology (MEXT) High Performance Computing Infrastructure (HPCI) initiative, the K computer project began in 2006 with the goal of advancing national competitiveness in by enabling ultra-precise simulations for global challenges such as , disaster prevention, and new . Operational from September 2012 until its decommissioning in 2019 to make way for the successor Fugaku system, it supported over 1,000 research projects annually across diverse fields, including physics, chemistry, and bioinformatics, while demonstrating exceptional energy efficiency at 0.83 gigaFLOPS per watt. The computer's architecture emphasized and reliability, with a Linux-based operating system, support for , C/C++, and MPI parallelization, and a capable of handling hundreds of petabytes of data. It topped the list twice, in June and November 2011, and later excelled in benchmarks like the Graph500 for processing, underscoring its versatility beyond raw floating-point performance. As a of supercomputing, the K computer not only accelerated breakthroughs in scientific modeling but also influenced global HPC designs through its custom processors and fault-tolerant engineering.

History and Development

Origins and Funding

In 2006, the Japanese government, through the Ministry of Education, Culture, Sports, Science and Technology (MEXT), announced the Next-Generation Supercomputing Project as part of the broader initiative. This effort was designated a key technology of national importance to bolster Japan's competitiveness in and address pressing global challenges requiring advanced simulation capabilities. The project focused on creating a petascale to support research in areas such as modeling, , and disaster prevention, aiming to enable breakthroughs that would position at the forefront of innovation. Originally planned as a consortium involving , , and to develop a hybrid vector-scalar system, the project faced setbacks when NEC and Hitachi withdrew in early 2009 due to economic difficulties. Fujitsu was subsequently selected as the sole lead developer on May 14, 2009, shifting the design to a fully scalar . The total development cost for the project was approximately 112 billion yen, funded primarily by the national government to ensure shared access for researchers across and . This investment reflected the strategic priority placed on supercomputing for advancing scientific discovery and , with the system intended for operation at RIKEN's facilities in . Annual operating costs were estimated at around US$10 million, covering maintenance, power, and support to sustain long-term utilization. RIKEN was appointed as the primary operator and coordinator, leveraging its expertise in computational research, while the partnership with Fujitsu combined RIKEN's scientific oversight with Fujitsu's extensive experience in design, involving over 1,000 engineers and researchers in the joint effort. The collaboration emphasized indigenous technology development to reduce reliance on foreign systems and foster domestic HPC capabilities.

Design and Construction

The development of the K computer began with in 2006, as part of a joint effort between and to create a next-generation for in . Full-scale development followed shortly thereafter, focusing on integrating advanced components tailored for massive parallelism. The first eight racks were shipped to 's Advanced Institute for Computational Science (AICS) facility in on September 28, 2010, enabling partial operations for initial testing and validation. Key design choices centered on the adoption of the SPARC64 VIIIfx processor, a customized version of the optimized for through enhancements in vector processing and power efficiency. The system was engineered to comprise 864 racks housed within 432 cabinets, with each rack containing 96 compute nodes, for a total of 82,944 compute nodes, providing the necessary for petascale simulations. This scale was selected to achieve target performance levels while maintaining interconnect efficiency. Construction milestones included the progressive installation of all 864 racks over approximately 11 months, culminating in full system assembly by August 2011 at the AICS facility in . Central to this was the integration of the Tofu interconnect, a six-dimensional mesh/torus network that ensured low-latency communication and scalability across the entire node array. Addressing challenges in an earthquake-prone region like , the AICS facility incorporated seismic-resistant structures and countermeasures to safeguard the system's main functions during seismic events. Simultaneously, designers tackled to 10 petaflops by implementing an advanced water-cooling system that managed heat dissipation and power demands, reducing CPU temperatures to enhance overall efficiency. These measures allowed the K computer to operate reliably in a high-risk while meeting ambitious performance goals.

Technical Specifications

Processor and Node Architecture

The K computer's compute nodes each incorporated a single SPARC64 VIIIfx processor, a custom eight-core scalar CPU developed by specifically for applications. Operating at 2.0 GHz, the processor delivered a peak performance of 128 GFLOPS (16 GFLOPS per core) through fused multiply-add (FMA) operations, with the processor supporting 16 GB of memory per node for balanced compute and data handling. The overall system scaled to 88,128 such processors, encompassing 705,024 cores distributed across 82,944 compute nodes and 5,184 I/O nodes, enabling massive parallel processing for scientific simulations. Architecturally, the SPARC64 VIIIfx was fabricated on a 45 nm silicon-on-insulator (SOI) process, integrating the directly on-chip to minimize and power overhead while maximizing to the DDR3 interface. Key features included dual 64-bit SIMD vector pipelines per core, enabling 128-bit wide floating-point operations via the Arithmetic and Control Extension (HPC-ACE) instruction set, which extended the V9 for vectorized workloads common in HPC. Additionally, the processor incorporated integer multiply-accumulate (MAC) instructions in the HPC-ACE extensions, facilitating efficient accumulation in integer-based algorithms for fields like climate modeling and . At the node level, four compute nodes were mounted on each system board, with 24 system boards accommodated per compute rack alongside six I/O system boards, resulting in 96 compute nodes per rack across the system's 864 racks. This dense, water-cooled organization optimized space and thermal management, with each node interconnected via the network for system-wide coordination.

Interconnect and Network

The K computer's interconnect, known as Tofu (Torus Fusion), is a proprietary high-performance network developed by to enable efficient communication among its compute nodes. It utilizes a six-dimensional (6D) / topology, structured as a of a in the xyz dimensions and a / in the abc dimensions, with the abc dimensions fixed at sizes 2 × 3 × 2 to align with physical hardware constraints and promote . This design provides direct node-to-node links without intermediate switches, ensuring low-latency data transfer and inherent through multiple routing paths that can bypass defective components. Each compute node features a Tofu interface with 10 bidirectional links, delivering a peak bandwidth of 10 GB/s per link (5 GB/s in each direction), for an aggregate off-chip bandwidth of 100 GB/s per node. The network supports the full scale of 88,128 nodes, allowing seamless parallel processing across the system while maintaining high bisection bandwidth for balanced communication in distributed workloads. Intra-node groups of 12 nodes sharing identical xyz coordinates are interconnected via the abc axes in a mesh/torus fashion, overlaying up to twelve independent 3D tori for optimized local exchanges, while inter-group connections extend the topology globally. Key features include built-in , where the system can dynamically reroute traffic around failed nodes—such as removing a minimal set of four nodes if one fails—without significant performance degradation, supporting reliable operation in large-scale environments. The hierarchical embedding of lower-dimensional tori within the 6D structure further enhances flexibility, enabling users to allocate virtual torus subnetworks for jobs regardless of physical node placement. This fault-tolerant, switchless contrasts with traditional switched fabrics by reducing points of and simplifying maintenance. The Tofu interconnect's design rationale prioritizes scalability for , high-bandwidth efficiency to support data-intensive simulations, and low-latency communication to minimize synchronization overhead in parallel applications. By embedding torus properties within each cubic fragment of the 6D , it achieves superior embeddability and efficiency compared to lower-dimensional alternatives, making it ideal for grand-challenge problems requiring massive inter-node coordination. These attributes contributed to the K computer's ability to sustain over 10 petaflops in real-world scientific computations.

Storage and File System

The K computer's storage infrastructure was built around the Fujitsu Exabyte File System (FEFS), a high-performance parallel based on Lustre, tailored to manage the enormous data volumes produced by petascale simulations. FEFS employed a two-layer consisting of a local for temporary, high-speed access and a global for large-scale, shared , with an initial capacity of several tens of petabytes scalable to a 100-petabyte class. This design allowed for efficient handling of datasets exceeding hundreds of terabytes, supporting the demands of scientific computing workloads. The storage hardware comprised thousands of Object Storage Server (OSS) nodes, including over 2,400 for the local file system and over 80 for the global file system, integrated with Fujitsu ETERNUS disk arrays configured in RAID5 for speed and RAID6 for capacity and redundancy. These OSS nodes delivered an aggregate bandwidth exceeding 1 TB/s, with measured read throughputs reaching 1.31 TB/s on 80% of the system using the IOR benchmark, ensuring sustained high-performance I/O for parallel applications. The system incorporated 6 OSS per storage rack to distribute load and maintain scalability, connected via the Tofu interconnect for low-latency data transfer. Dedicated I/O nodes, functioning as OSS, handled data movement between the compute nodes and storage layers, minimizing contention and enabling asynchronous transfers through the Tofu network. This setup supported up to 20,000 OSS and 20,000 object storage targets (OSTs), allowing dynamic expansion without downtime. Integration with the job scheduler facilitated automatic file staging, where input data was transferred to local storage prior to job execution and output results were archived to the global system post-completion, optimizing overall workflow efficiency. FEFS emphasized high-throughput access for large simulation datasets via Lustre extensions, including MPI-IO optimizations, file striping across up to 20,000 OSTs, and a 512 KB block size tuned for the system's interconnect. Reliability was enhanced through hardware-level redundancy, such as duplicated components and mechanisms, alongside software features like continuous journaling and automatic recovery to prevent during intensive operations. These capabilities ensured robust performance in reliability-critical environments, with minimal downtime even under full-scale usage.

Power Consumption and Efficiency

The K computer required a total power consumption of 12.66 MW at full load, encompassing both IT equipment and supporting . This high demand was managed through a dedicated , including facilities and commercial grid connections, to ensure stable operation for sustained computational tasks. Cooling demands were addressed with a water-cooling system for critical components like CPUs, interconnect chips, and power supplies, supplemented by , achieving a (PUE) of 1.34 during LINPACK testing. This hierarchical cooling design distributed cold water at 15 ± 1°C to node-level components while using at the facility level, with high-efficiency fans contributing to overall energy savings compared to traditional air-only systems. The setup supported dense packing of up to 96 compute nodes per , minimizing thermal hotspots and enabling reliable performance. Energy efficiency reached 824.6 GFLOPS/kW on the LINPACK benchmark in its June 2011 configuration, reflecting optimized hardware and cooling integration. This metric was bolstered by the low-power SPARC64 VIIIfx processors, each consuming 58 W while delivering 128 GFLOPS peak performance through techniques like and low-leakage transistors. The full later improved to approximately 830 GFLOPS/kW, highlighting the design's focus on balancing high throughput with reduced energy use. Environmental resilience was incorporated via seismic using 49 laminated-rubber dampers, allowing the to withstand accelerations up to 200 —equivalent to intensity levels 5 (no damage) and upper 6 (minor damage)—while optimizing power distribution for uninterrupted operations during potential disruptions.

Performance and Benchmarks

TOP500 Rankings

The K computer achieved its first TOP500 ranking in June 2011, securing the number one position with an Rmax performance of 8.162 petaFLOPS on the LINPACK benchmark, calculated using 548,352 processor cores. This result demonstrated 93.0% efficiency relative to its Rpeak of 8.774 petaFLOPS, surpassing China's Tianhe-IA system that had held the top spot. The system's partial deployment at this stage highlighted the effectiveness of its SPARC64 VIIIfx processors and interconnect in delivering high sustained performance. By November 2011, following full deployment with 705,024 cores, the K computer retained the top ranking and became the first to exceed 10 petaFLOPS in Rmax, recording 10.51 petaFLOPS against an Rpeak of 11.28 petaFLOPS. This milestone underscored its dominance in the petaFLOPS era and maintained Japan's lead in supercomputing capability. The K computer held the number one position for two consecutive lists before being overtaken by IBM's in June 2012, dropping to number two with unchanged Rmax of 10.51 petaFLOPS. Over the subsequent years, it gradually declined in the rankings as faster systems emerged: number three in November 2012, number four from June 2013 to November 2015, number five in June 2016, number seven in November 2016, number eight in June 2017, number ten in November 2017, number sixteen in June 2018, and number eighteen in November 2018. By June 2019, it had fallen to number twenty, reflecting the rapid advancement in global supercomputing performance while its own LINPACK score remained stable at 10.51 petaFLOPS Rmax.
DateRankRmax (petaFLOPS)Cores
June 201118.162548,352
November 2011110.51705,024
June 2012210.51705,024
November 20181810.51705,024

Other Benchmarks and Achievements

Beyond its dominance in the LINPACK-based rankings, the K computer demonstrated exceptional performance across diverse benchmarks that evaluate aspects such as , irregular access patterns, and productivity. In November 2011, at the SC11 conference, the K computer secured first place in all four categories of the HPC Challenge Class 1 Awards: High-Performance LINPACK (HPL), Global performance, RandomAccess, and Effective Private-sector Oriented Applications (EP), highlighting its versatility as the most productive and high-performing system of the year. The system's architectural efficiency was underscored by its 93.2% attainment of peak performance in the LINPACK benchmark during its full deployment in November 2011, a calculated as the sustained performance (Rmax of 10.51 petaFLOPS) divided by the theoretical peak (Rpeak of 11.28 petaFLOPS), setting a high standard for utilization. On the front, the K computer ranked 32nd on the November 2011 Green500 list with 830 MFLOPS per watt, reflecting its balanced design despite high power draw, amid a global push for sustainable . Technical milestones further validated the K computer's capabilities in real-world simulations. It enabled the first global non-hydrostatic simulation at 14 km horizontal resolution using the NICAM framework, completing multi-year runs that captured fine-scale atmospheric dynamics unattainable on prior systems. In the HPCG benchmark, which stresses memory-bound operations more representative of scientific workloads than LINPACK, the K computer achieved 0.6027 petaFLOPS in November 2018, securing third place globally and maintaining relevance years after its peak standing.

Applications and Scientific Impact

Research Areas

The K computer significantly advanced computational research across multiple scientific domains, primarily climate and weather modeling, earthquake simulation, , and . These fields benefited from the system's massive capabilities, enabling complex simulations that were previously infeasible on smaller scales. In climate and weather modeling, the K computer supported high-resolution global simulations, such as those using the Nonhydrostatic Icosahedral Atmospheric Model (NICAM) at a 7 km grid spacing, allowing for detailed analysis of atmospheric dynamics and precipitation patterns over extended periods. Earthquake simulation leveraged the system's power for modeling seismic waves and tsunami propagation with unprecedented fidelity, aiding in disaster prediction and mitigation strategies. In drug discovery, it facilitated molecular dynamics simulations essential for understanding biomolecular interactions, including protein folding pathways relevant to medical applications. Materials science research utilized the K computer for atomistic simulations of novel compounds and manufacturing processes, contributing to advancements in energy and structural materials. The system's user base encompassed academia and , with numerous projects allocated annually through RIKEN's competitive system as part of the High Performance Computing Infrastructure (HPCI) initiative, fostering interdisciplinary collaboration and innovation. Over its operational lifespan, it supported more than 11,000 individual users and 200 companies, reflecting broad adoption in these research areas. Running a customized Linux-based operating system with architecture-specific drivers, the K computer incorporated libraries such as PETSc, which enabled efficient solving of large-scale linear systems in simulations across these domains. This software environment optimized resource utilization for diverse applications, achieving high node efficiency in production runs.

Notable Projects

The K computer facilitated groundbreaking simulations of the 2011 Tohoku earthquake, enabling high-resolution modeling of seismic wave propagation, strong ground motions, and inundation to enhance prediction algorithms for future disasters. Researchers at utilized the system's massive capabilities to perform tsunami-coupled finite-difference simulations, achieving unprecedented accuracy in replicating the event's wave dynamics and impacts on coastal areas like to inform improved mitigation strategies. In , the K computer supported advanced efforts targeting G protein-coupled receptors (GPCRs), key therapeutic targets for numerous diseases. Using software like , developed at , scientists simulated large-scale biological systems to identify potential ligands, improving binding pose predictions and accelerating hit identification compared to traditional methods. These efforts included hierarchical approaches that combined and dynamics simulations, yielding more selective candidates for Class B GPCRs and contributing to organized pipelines. Climate modeling on the K computer advanced simulations of paths and scenarios, succeeding earlier systems like the . teams integrated satellite data, such as from Himawari-8, with high-resolution non-hydrostatic atmospheric models to predict events, including heavy rainfall and trajectories, with updates every 10 minutes for improved . These kilometer-scale global simulations provided insights into intensification under warming conditions, aiding disaster mitigation and long-term climate projections. The K computer also contributed to fusion energy research through collaborations with the Japan Atomic Energy Agency, where it enabled large-scale plasma simulations to study behaviors in fusion reactors like . These computations modeled energetic particle migrations and in plasmas, supporting the development of stable confinement strategies for sustainable . In nanotechnology, simulations on the K computer provided precise calculations for fullerenes, predicting the heat of formation for structures from C60 to larger variants like C320 with high accuracy. This work advanced understanding of carbon nanomaterial stability and reactivity, informing applications in and . Overall, these projects exemplified the K computer's role in driving scientific discovery, contributing to over 1,700 publications, including around 390 peer-reviewed papers, across diverse fields by 2018.

Legacy and Shutdown

Successor and Decommissioning

The K computer was decommissioned and shut down on August 30, 2019, following a at the Center for Computational Science in , . After approximately eight years of full operation since its installation in , the system's aging hardware had reached the end of its reliable service life, necessitating retirement to facilitate upgrades and avoid operational conflicts during the to its successor. The successor to the K computer, known as the Fugaku supercomputer, was developed under Japan's Post-K project (also called the FLAGSHIP 2020 Project) as a national initiative led by RIKEN and Fujitsu. Named after an alternate term for Mount Fuji, Fugaku achieved exascale performance—approximately 100 times faster than the K computer's petascale capabilities—beginning with early access in 2020 and full production deployment in 2021. Following the shutdown, significant data from the K computer—accumulated over years of simulations and computations—was backed up starting in mid-August 2019 and migrated to Fugaku to ensure continuity of research workflows and task relay. Select components, including panels and system boards, were preserved and donated to institutions such as Kobe City University of Foreign Studies for educational and research purposes, while the facility was reconfigured to support Fugaku's higher power and cooling demands. This transition was driven by evolving computational requirements in fields like and analysis, which exceeded the K computer's petascale limitations and demanded exascale capabilities for advanced simulations and data-intensive applications.

Cultural and Infrastructural Impact

The K computer's prominence extended beyond technical achievements, manifesting in tangible cultural symbols within . In July 2011, the Kobe Municipal Subway's Port Island Minami Station, located near the Advanced Institute for Computational Science, was renamed K Computer Mae Station to commemorate the supercomputer's development and operational launch. This renaming highlighted the machine's role as a national icon of innovation, with the station serving as a daily reminder for commuters of Japan's leadership in . Following the K computer's decommissioning in 2019, the station reverted to Keisan Kagaku Center-mae Station in June 2021, reflecting the transition to its successor while preserving the site's computational heritage. As a emblem of Japanese technological excellence, the K computer received widespread media attention, positioning it as a symbol of the nation's resurgence in global after earlier setbacks in the supercomputer race. Coverage in international outlets like emphasized its unprecedented speed and implications for scientific breakthroughs, while domestic publications such as Highlighting Japan showcased it as a pinnacle of government-backed R&D. Post-decommissioning, components of the system, including compute racks and system boards, were preserved for public exhibition, with parts donated to 13 museums and institutions nationwide; the full documentation is maintained by the Information Processing Society of Japan (IPSJ) Computer , underscoring its status as a pivotal artifact in computing history. The K computer's success catalyzed sustained governmental commitment to , demonstrating ripple effects that justified further investments in infrastructure and related technologies. A Ministry of Education, Culture, Sports, Science and Technology (MEXT) study quantified these impacts, noting how the project's outcomes informed the Post-K initiative with a dedicated of 130 billion yen, fostering advancements in and that bolstered Japan's position in semiconductors and emerging applications after 2019. RIKEN marked the K computer's contributions through dedicated events and training initiatives that enhanced public and academic engagement. Annual symposiums, such as the joint Hokkaido University-RIKEN event in 2013 and the 2021 R-CCS gathering focused on K-developed applications, facilitated discussions on its utilization across disciplines and its legacy for future systems. Complementing these, RIKEN's educational programs at the Advanced Institute for Computational Science (AICS) included HPC summer schools and workshops that trained over thousands of young researchers in parallel programming and computational techniques directly on the K computer, promoting human capital development in computational science.