Power10
The IBM POWER10 is a high-performance microprocessor family developed by IBM, constituting the tenth generation of the POWER architecture for enterprise servers and computing systems. Announced on August 17, 2020, with first commercial systems available in September 2021, it is fabricated using a 7 nm CMOS process with 18 metal layers and incorporates up to 18 billion transistors per chip, emphasizing energy efficiency, data-intensive workloads, and advancements in artificial intelligence (AI) and hybrid cloud applications, succeeded by the POWER11 processor in 2025.[1][2][3][4][5]Architecture and Design
The POWER10 employs a superscalar symmetric multiprocessor (SMP) design compliant with Power ISA Version 3.1, offering backward compatibility with POWER8 and POWER9 modes.[4] It supports configurations including Dual Chip Modules (DCM) with up to 24 cores per socket, Single Chip Modules (SCM) with up to 15 cores, and Entry Single Chip Modules (eSCM) with up to 8 cores, enabling systems like the Power E1080 to scale to 240 cores overall.[6][4] Each core features simultaneous multithreading (SMT8) for up to 8 threads, a 96 KB L1 instruction cache (2 x 48 KB), a 32 KB L1 data cache, 2 MB of L2 cache, and 8 MB of local L3 cache per core, with a total on-chip L3 cache of up to 120 MB using a non-uniform cache architecture (NUCA) for efficient data access.[4] The processor integrates four Matrix Math Accelerator (MMA) units per core to accelerate AI inferencing, particularly for reduced-precision formats like bfloat16 and INT8.[4]Performance and Efficiency
POWER10 delivers up to 3x the performance of POWER9 in targeted workloads, with 2.6x greater energy efficiency and 10–20x faster AI inferencing capabilities, driven by enhanced load/store bandwidth and a quad-issue superscalar execution pipeline operating at frequencies from 2.45 GHz to 4.0 GHz depending on the model.[2][4] Memory support includes the Open Memory Interface (OMI) with Differential DIMMs (DDIMMs) for DDR4 or DDR5, providing up to 409 GB/s bandwidth per chip and capacities reaching 64 TB in high-end systems like the Power E1080.[6][7][4] Interconnects feature PCIe Gen5 slots for I/O expansion and SMP links with 128 GBps bandwidth per link, a 33% improvement over POWER9, facilitating scalable hybrid cloud and edge deployments.[4]Security and Applications
Security is a core focus, with pervasive hardware-based encryption including AES-256 for memory (in CTR mode), four times the AES units per core compared to POWER9, and support for quantum-safe cryptography via certified coprocessors (FIPS 140-2 Level 4).[4] Additional protections encompass hardware mitigations for speculative execution vulnerabilities, secure boot mechanisms, and low common vulnerabilities and exposures (CVE) rates, making it suitable for mission-critical enterprise environments.[4] The POWER10 powers systems running AIX, IBM i, Linux, and PowerVM virtualization, targeting applications in AI/machine learning, database management (e.g., Oracle), web servers, and data analytics.[6][4]Overview
Architecture
The IBM Power10 processor adheres to the Power Instruction Set Architecture (ISA) version 3.1, incorporating extensions optimized for artificial intelligence workloads, such as the Matrix Math Accelerator for accelerated matrix operations, and enhanced security features including pervasive memory encryption, transparent memory encryption, and support for quantum-safe cryptography. These extensions build on the foundational Power ISA framework to enable efficient handling of AI inference and training tasks alongside robust protection against modern threats.[7][8] The core microarchitecture, designated as P10, employs a superscalar, out-of-order execution design with up to 15 cores per single-chip module (SCM), using a chiplet-based approach where core chiplets each contain two cores, supporting simultaneous multithreading up to 8-way (SMT-8) or 4-way (SMT-4) for flexible thread scaling to match workload demands. The processor integrates up to eight core chiplets, facilitating high core densities in enterprise configurations. Fabricated on a 7 nm Samsung process node, the processor die measures 602 mm² and contains approximately 18 billion transistors, enabling dense integration of computational resources while maintaining power efficiency. Clock frequencies reach up to 4 GHz, balancing performance with thermal constraints in multi-chiplet setups.[7][8] The cache hierarchy is structured for low-latency access and high bandwidth, featuring two 48 KB instruction caches (96 KB total) and 32 KB data caches at the L1 level per core, a dedicated 2 MB L2 cache per core, and a shared 120 MB L3 cache per chiplet utilizing a non-uniform cache architecture (NUCA) for optimized data locality. This design prioritizes rapid data retrieval for out-of-order execution pipelines, reducing stalls in compute-intensive applications. The Power10 also supports PCIe 5.0 for high-speed I/O connectivity.[7][8]Key Innovations
The IBM POWER10 processor introduces the Matrix Multiply Assist (MMA) engines, which provide dedicated hardware acceleration for AI inference workloads. Each core features four MMA engines utilizing 256-bit SIMD operations to perform matrix multiplications efficiently, supporting data types such as FP32, BFloat16, and INT8. This enables up to 20x faster AI inference performance for INT8 operations compared to the POWER9 processor, allowing enterprises to process AI tasks directly on the chip without external accelerators.[2][7] Security enhancements in POWER10 emphasize hardware-based protections to safeguard data in hybrid cloud environments. Transparent memory encryption is implemented pervasively across all volatile memory using AES-CTR mode at the memory controller level, ensuring end-to-end data protection without performance overhead. Additionally, secure boot establishes a chain of trust from the service processor through firmware and operating system components, preventing unauthorized code execution during initialization. Specific features include pointer authentication via cryptographic hashing of return addresses to mitigate return-oriented programming attacks, and full-system encryption that extends protections to persistent memory with AES-XTS mode support.[2][9][7][10] For I/O connectivity, POWER10 integrates OpenCAPI 3.0 for coherent accelerator attachments and PCIe 5.0 interfaces, delivering up to 64 GB/s bandwidth per slot to support high-performance data movement. These advancements enable multi-petabyte memory clustering and seamless integration with external devices like GPUs and FPGAs. Energy efficiency is improved by a factor of 2.6x in performance per watt over POWER9 at the core level, achieved through the 7 nm semiconductor process and dynamic voltage/frequency scaling via the EnergyScale technology, which optimizes power consumption based on workload demands.[7][4][11]Design Details
Processor Core
The Power10 processor core employs a superscalar architecture with out-of-order dispatch, enabling efficient parallel execution of instructions while maintaining compatibility with the Power Instruction Set Architecture (PowerISA).[7] Fabricated on a 7 nm process with a 602 mm² die size and approximately 18 billion transistors, this design incorporates advanced branch prediction mechanisms that achieve higher accuracy and lower misprediction flush rates compared to prior generations, supported by deeper and wider instruction windows for improved scheduling.[7] Each core is organized into two execution resource domains, facilitating modular handling of diverse workloads.[12] The cores support variable simultaneous multithreading (SMT) modes to optimize for different application profiles: SMT-8, which allows up to eight threads per core, is tailored for throughput-oriented workloads and typically pairs with configurations of up to 15 cores per chip, maximizing thread density for tasks like database processing.[12] In contrast, SMT-4 mode, limited to four threads per core, targets compute-intensive applications such as scientific simulations, enabling denser core packing with up to 30 cores per chip to prioritize single-thread performance over multithreading.[12] SMT-2 and single-threaded modes are also available for fine-tuned resource allocation, with automatic workload balancing dynamically adjusting thread counts across modes to enhance efficiency.[13] Execution units within each core feature an 8-wide dispatch capability in SMT-8 mode, allowing up to eight instructions to be issued per cycle for high instruction-level parallelism.[7] The integer and floating-point pipelines include multiple dedicated paths, such as two quad-precision/decimal floating-point units and enhanced fixed-point operations, ensuring robust handling of scalar computations.[12] Vector processing is powered by eight vector-scalar units (VSUs) per core—four per domain—each 128 bits wide, fully supporting PowerISA Vector Scalar Extension (VSX) for SIMD operations, permutations, cryptography, and other vectorized tasks, with a 512-bit accumulator for precision.[7] The cores integrate four Matrix Math Accelerator (MMA) units to boost AI inferencing through specialized matrix operations.[13] Power management at the core level leverages IBM's EnergyScale technology for per-core voltage and frequency scaling, dynamically adjusting based on workload demands, thermal constraints, and active thread counts to balance performance and efficiency.[7] Core parking is implemented through workload balancing mechanisms that can reduce active threads per core from eight to as few as one during low-utilization periods, effectively idling resources without full deactivation.[13] These features operate across modes like power-saving (minimum frequency), static (nominal frequency), and maximum performance (up to 4.0 GHz, workload-dependent), with frequencies scaling from 2.0 GHz in low-power states.[12]I/O and Memory
The Power10 processor incorporates advanced I/O subsystems to handle high-bandwidth data transfer, featuring support for PCIe 5.0 with up to 128 lanes per processor module operating at 32 GT/s per lane, providing aggregate bandwidth of up to ~1 TB/s bidirectional depending on configuration (e.g., ~126 GB/s for x16 slots).[14][13] This configuration supports flexible lane bifurcation, such as 1x16, 2x8, or mixed Gen5 and Gen4 setups, facilitating integration with a wide range of adapters including network interfaces, storage controllers, and accelerators while maintaining backward compatibility with earlier PCIe generations.[7] Additionally, the processor includes OpenCAPI 4.0 as a coherent accelerator processor interface, providing up to 25.6 GB/s per link to enable low-latency, cache-coherent communication with external devices like GPUs or specialized memory modules, enhancing data-intensive workloads by allowing direct memory access without CPU intervention.[7][15] The memory architecture centers on integrated controllers supporting DDR4-3200 or DDR5 via the Open Memory Interface (OMI), with a maximum capacity of 4 TB per socket across up to 32 DDIMM slots for high-speed, buffered access.[13][16][17] These controllers incorporate error-correcting code (ECC) for data integrity and hardware-based encryption via AES in counter mode, applied pervasively to protect against physical attacks without performance overhead.[7] This setup delivers peak bandwidths approaching 410 GB/s per socket for DDR4 or up to 819 GB/s for DDR5, prioritizing reliability and security in enterprise environments.[18] Internally, the on-chip interconnect employs the X-Bus for communication between chiplets, offering an aggregate bandwidth of 1.6 TB/s to ensure seamless data flow across the processor's modular components, including cores and I/O elements.[19][13] This high-throughput fabric supports the processor's modular design, minimizing latency in intra-socket operations.Variants
The IBM Power10 processor employs a modular single-die design, organized into two hemispheres each supporting up to 8 SMT-8 cores or 16 SMT-4 cores, to enable flexible configurations tailored to different workloads.[13] This architecture allows for scalability in core density and threading, supporting up to 30 cores in SMT-8 mode or 60 cores in SMT-4 mode per socket in dual-chip configurations, depending on the selected variant.[13] The high-end variant is optimized for enterprise workloads, featuring 15 SMT-8 cores per chip to maximize thread-level parallelism and virtualization efficiency in demanding transactional and database environments.[13] In contrast, the compute variant targets high-performance computing (HPC) and artificial intelligence applications, utilizing 30 SMT-4 cores per chip to deliver higher per-core performance for vectorized and matrix-accelerated computations.[13] Power10 modules are packaged as multi-chip modules (MCMs), incorporating integrated voltage regulators to enhance power delivery efficiency and thermal management across the chips.[13] This MCM approach facilitates seamless integration of compute, I/O, and memory elements within a compact footprint.[13]Systems
Enterprise
The IBM Power E1080 represents the flagship enterprise server in the Power10 lineup, engineered for mission-critical workloads requiring extreme scalability and unwavering reliability. This multi-node system supports up to four interconnected nodes, enabling configurations with as many as 16 processor sockets and 240 Power10 cores, which deliver robust processing power for consolidating large-scale databases such as IBM Db2 and SAP HANA, as well as enterprise resource planning (ERP) applications like SAP S/4HANA.[7] With a maximum memory capacity of 64 TB across the system—16 TB per node—the E1080 facilitates in-memory analytics and transaction processing at enterprise scale, reducing latency and enhancing data throughput for high-volume operations.[7][6] Central to the E1080's enterprise suitability are its advanced redundancy features, which minimize downtime in demanding environments. The system incorporates hot-swappable components, including power supplies, NVMe drives, PCIe adapters, fans, and SMP interconnect cables, allowing maintenance without interrupting operations.[7] Power redundancy is achieved through N+2 configurations with dual supplies per node, while cooling employs N+1 fan redundancy to ensure continuous thermal management even under failure conditions.[7] These elements are bolstered by comprehensive reliability, availability, and serviceability (RAS) enhancements, such as first-failure data capture (FFDC), pervasive memory encryption using AES-CTR, concurrent repair capabilities, and automated fault isolation, which collectively target mean time to failure rates exceeding 99.999% for sustained uptime in critical infrastructure.[7] Storage integration further amplifies the E1080's enterprise prowess, supporting up to four internal NVMe U.2 drives per node for high-speed local access, scalable to 288 drives via up to 12 attached NED24 drawers for massive data repositories.[7] Connectivity to external storage solutions like IBM FlashSystem is seamless through 32 PCIe Gen 5 slots and NVMe over Fibre Channel protocols, enabling terabyte-scale all-flash arrays optimized for database acceleration and hybrid cloud deployments.[7] This architecture allows enterprises to handle petabyte-level workloads with low-latency I/O, integrating directly with IBM's enterprise storage ecosystem for simplified management and data protection.[6]Mid-range
The IBM Power E1050 serves as the primary mid-range offering in the Power10 systems portfolio, targeting departmental to enterprise-scale workloads that demand scalable performance without the full overhead of high-end configurations. It accommodates 2 to 4 sockets, each equipped with a Power10 processor module featuring 12, 18, or 24 active cores, for a maximum of 96 cores across the system. With support for up to 16 TB of memory via Open Memory Interface (OMI) slots, the E1050 excels in analytics processing and virtualization environments, enabling efficient handling of data-intensive tasks such as database operations and virtual machine orchestration.[12][20] Housed in a compact 4U rack form factor, the E1050 optimizes space in data centers while providing robust expansion options through integrated I/O drawers. It supports up to four PCIe Gen3 or Gen4 I/O expansion drawers (EMX0), which deliver additional hot-swap slots for PCIe adapters, storage controllers, and networking devices, facilitating modular growth tailored to evolving workload needs.[21][12] Energy efficiency is a core design principle of the E1050, driven by the 7 nm Power10 processor architecture, which achieves roughly 2.6 times the energy efficiency per socket compared to POWER9-based systems. This focus on power optimization, combined with its mid-tier scalability, results in a lower total cost of ownership (TCO) relative to enterprise-class models that emphasize maximum redundancy and capacity.[12][20]Scale-out
The IBM Power10 scale-out servers are designed for compact, high-density deployments in cloud, edge, and distributed computing environments, emphasizing horizontal scaling for hyperscale data centers and resource-efficient workloads. These systems prioritize reduced physical footprints while delivering enterprise-grade performance for applications running on AIX, IBM i, and Linux.[22][23] The primary models in the IBM Power S101x and S102x series include the S1012, S1014, S1022, and S1022s, all based on Power10 processors in 1- or 2-socket configurations. The S1012 is a 1-socket system supporting 1, 4, or 8 cores, offered in a half-wide 2U rack or tower form factor for edge and small-business use. The S1014 provides 1 socket with up to 8 cores and up to 1 TB of memory in a 4U rack or tower chassis, suitable for entry-level distributed tasks. The S1022 offers 2 sockets with up to 40 cores and up to 4 TB of memory in a 2U rack form factor, enabling higher-density compute for cloud-native applications. The S1022s, a cost-optimized variant, supports 2 sockets with up to 16 cores in a similar 2U design, targeting budget-conscious scale-out scenarios. These configurations allow up to 60 cores across the series in multi-node racks, facilitating efficient resource pooling.[24][25][22][23] Density optimizations in these servers include half-wide chassis options for the S1012, which can reduce IT footprints by up to 75% compared to full-width predecessors, and support for shared power supplies across multiple nodes to minimize rack space and power consumption in hyperscale environments. High-core-per-rack capabilities enable deployments of up to several hundred cores per standard 42U rack, optimizing for containerized and virtualized workloads in dense cloud infrastructures.[24][13] Networking features integrate high-speed options such as 100 GbE adapters, including SR-IOV-capable ports for virtualized traffic, to support low-latency distributed computing. These servers are certified for Red Hat OpenShift Container Platform, providing container orchestration with enhanced price-performance over comparable x86 systems for hybrid cloud deployments.[13][26]Software Ecosystem
Operating Systems
IBM AIX 7.3 provides full native support for Power10 processors, enabling scalability up to 240 cores (1920 hardware threads) in a single logical partition (LPAR).[27] This version includes enhancements for Power10's AI matrix-multiply accelerator, allowing AIX applications to leverage on-chip AI extensions for tasks like inference and training.[27] Additionally, AIX 7.3 supports Live Partition Mobility (LPM), facilitating seamless migration of running AIX partitions between Power10 systems without downtime, provided compatible hardware and PowerVM configurations are in place.[28] IBM i 7.6 is optimized for business-critical applications on Power10 servers, supporting up to 48 SMT8 cores (384 threads) per partition on Power10 hardware.[29] It integrates tightly with Db2 for i, featuring enhancements such as native multi-factor authentication (MFA) using time-based one-time passcodes, new SQL functionalities including data-change-table-reference for UPDATE and DELETE statements, and improved high-availability options like enhanced Db2 Mirror for real-time data replication across Power10 systems.[30] This integration enables efficient handling of transactional workloads, with Power10-specific optimizations for I/O and processor utilization in enterprise environments.[29] Several Linux distributions are certified for Power10, leveraging kernel versions 5.9 and later to access processor-specific features like improved cryptography acceleration and perf event support.[31] Red Hat Enterprise Linux (RHEL) 9 and 10 offer full support, with RHEL 10 certified for Power10 in mid-2025, enabling advanced AI workloads and container orchestration via Podman.[32] Ubuntu 22.04 LTS and 24.04 LTS provide robust compatibility, including post-copy migration recovery and absolute clock offset handling for Power10 environments.[33] SUSE Linux Enterprise Server (SLES) 15 SP6 includes Power10 performance enhancements for cryptography via NSS FreeBL and OpenSSL, optimizing secure communications and data processing.[34]Virtualization and Firmware
Power10 systems leverage PowerVM as the foundational hypervisor for virtualization, enabling logical partitioning (LPAR) to divide physical resources into isolated environments for multiple workloads. This technology supports up to 1,000 LPARs per system on Power10-based servers, facilitating server consolidation and dynamic resource allocation across enterprise-scale configurations.[35] PowerVM also incorporates Active Memory Expansion, a feature that uses real-time compression to expand the effective memory capacity of an LPAR by up to 4x without requiring additional physical RAM, thereby optimizing utilization in memory-intensive applications.[36] For Linux-centric deployments, the OpenPower Abstraction Layer (OPAL) serves as an open-source firmware alternative to PowerVM, providing a standardized interface for direct hardware access and enabling runtime reconfiguration of processors, memory, and I/O resources without system reboots. OPAL integrates with the hostboot and skiboot components to support bare-metal Linux installations on Power10, promoting flexibility in open-source ecosystems.[37][38] The Hardware Management Console (HMC) acts as the centralized management tool for Power10 environments, offering capabilities for provisioning new LPARs, real-time monitoring of system health and performance metrics, and seamless live partition mobility to migrate running workloads between servers with minimal downtime. Version 11 of the HMC, supporting Power10, enhances automation through integration with PowerVM for policy-based resource adjustments.[28][39] Power10 firmware embeds robust security measures, including an immutable boot process via Secure Boot, which cryptographically verifies the integrity of firmware components during initialization to prevent unauthorized code execution. Additionally, tamper detection mechanisms within the firmware monitor for physical or logical alterations to boot images and hardware, triggering alerts or halts to maintain system trustworthiness in high-security deployments.[40][7]Performance Analysis
Comparison with POWER9
The IBM POWER10 processor represents an evolutionary advancement over its predecessor, the POWER9, with key architectural enhancements focused on density, efficiency, and workload acceleration, particularly for AI and high-performance computing. Fabricated on a 7 nm Samsung CMOS process node with 18 metal layers, POWER10 achieves significantly higher transistor density compared to POWER9's 14 nm process, enabling more cores and improved power efficiency per core.[13][2] In terms of core configuration, POWER10 supports up to 15 high-performance cores per single-chip module in SMT-8 mode or up to 30 cores in SMT-4 mode, leveraging the smaller process node for greater integration; this contrasts with POWER9, which maxes out at 12 cores in SMT-8 mode or 24 cores in SMT-4 mode per chip.[13][8] For threading, POWER10 introduces flexible simultaneous multithreading (SMT) options including SMT-8, SMT-4, SMT-2, and single-thread (ST) modes, with SMT-8 providing up to twice the threads per core in high-throughput scenarios compared to POWER9's baseline SMT-4 (though POWER9 also supports SMT-8 in select configurations). This allows POWER10 to better balance throughput and latency for diverse workloads.[13][7] POWER10 significantly upgrades I/O capabilities, featuring PCIe 5.0 with 32 lanes at 32 GT/s—doubling the bandwidth of POWER9's PCIe 4.0 at 16 GT/s—and OpenCAPI 4.0, an evolution from OpenCAPI 3.0 that supports higher-speed coherency and attachment for accelerators. Additionally, symmetric multiprocessing (SMP) interconnect bandwidth reaches 128 GB/s per chip-to-chip link on POWER10, a 33% increase over POWER9.[13] On the instruction set front, POWER10 implements Power ISA 3.1 (specifically v3.1B), extending POWER9's Power ISA 3.0 with the Matrix-Multiply Assist (MMA) facility, which introduces dedicated vector operations for matrix mathematics optimized for AI inferencing and training, such as BF16, FP16, and INT8 formats previously unavailable on POWER9.[13][41]| Feature | POWER10 | POWER9 |
|---|---|---|
| Process Node | 7 nm Samsung CMOS, 18 layers | 14 nm Samsung |
| Cores per Chip | 15 (SMT-8) / 30 (SMT-4) | 12 (SMT-8) / 24 (SMT-4) |
| Threading Modes | SMT-8/4/2/1 (SMT-8 standard) | SMT-8/4/2/1 (SMT-4 baseline) |
| PCIe | Gen 5 (32 GT/s, 32 lanes) | Gen 4 (16 GT/s) |
| OpenCAPI | 4.0 (enhanced ports/speed) | 3.0 |
| ISA Version | 3.1 with MMA (AI vector ops) | 3.0 |