Graphcore
Graphcore Limited is a British semiconductor company founded in 2016 in Bristol, United Kingdom, by serial entrepreneurs Nigel Toon and Simon Knowles, specializing in the design and production of Intelligence Processing Units (IPUs), a type of parallel processor architected specifically for accelerating artificial intelligence and machine learning workloads.[1][2][3]
The company's IPUs emphasize massive on-chip parallelism, with each unit featuring thousands of independent processing cores and integrated memory to handle complex AI models more efficiently than traditional GPUs for certain tasks, supported by the proprietary Poplar software stack for model training and inference.[3][4]
Graphcore raised significant venture funding, including a $32 million Series A round led by Robert Bosch Venture Capital in 2017, achieving unicorn status amid the AI hardware boom, before being acquired by SoftBank Group as a wholly owned subsidiary to bolster its global AI compute capabilities.[2][5]
In 2025, Graphcore announced plans to invest up to £1 billion over the next decade in India, establishing an AI Engineering Campus in Bengaluru to create 500 semiconductor jobs and expand research in AI infrastructure.[6][7]
Founding and Early Development
Inception and Founders
Graphcore was founded on 14 November 2016 in Bristol, United Kingdom, by serial entrepreneurs Nigel Toon and Simon Knowles, who respectively assumed the roles of chief executive officer and chief technology officer.[1][2] The company emerged from a stealth development phase that began around late 2013, with formal incorporation aimed at creating specialized processors to address limitations in machine learning workloads beyond conventional GPUs and CPUs.[8] The inception of Graphcore traces to January 2012, when Toon and Knowles met at the Marlborough Tavern in Bath to brainstorm opportunities following the exits from their prior ventures in semiconductor design.[9][10] Both founders brought extensive experience in processor innovation: Toon had served as CEO of two venture-backed firms, picoChip (acquired by Microsemi in 2012) and XMOS, focusing on multicore and embedded processing technologies.[11] Knowles, a physicist and silicon engineer with over 40 years in the field, had co-founded and exited two fabless semiconductor companies, including Icera (acquired by Intel in 2011), and contributed to 14 production chips, including early domain-specific architectures for signal processing.[12][13][14] This partnership leveraged Bristol's engineering heritage, rooted in hardware innovation since the 1970s, to pioneer the Intelligence Processing Unit (IPU), a microprocessor optimized for AI inference and training through massive on-chip memory and parallelism.[15] Initial seed funding in 2016, led by Fidelity and including early backers like the founders' networks, enabled prototyping amid a nascent competitive landscape dominated by general-purpose accelerators.[16]Initial Technology Focus and Prototyping
Graphcore's initial technology efforts concentrated on designing the Intelligence Processing Unit (IPU), a processor architecture optimized for machine intelligence applications, distinguishing it from graphics processing units (GPUs) by integrating the full machine learning model on-chip to minimize data transfer bottlenecks. Founded in 2016 by hardware engineers Nigel Toon and Simon Knowles—veterans of Icera, which they sold to Nvidia in 2011—the company targeted the inefficiencies of existing processors in managing AI's graph-like, probabilistic computations through a massively parallel, MIMD-based structure comprising thousands of lightweight processing threads. This approach prioritized low-precision arithmetic to accelerate inference and training tasks requiring rapid iteration over vast parameter spaces, rather than high-precision numerical simulations.[15][17][18] Prototyping commenced in 2016 following the company's incorporation in Bristol, UK, with seed investments enabling the fabrication of early IPU silicon to validate the architecture's scalability and performance for AI workloads. These prototypes emphasized on-chip memory hierarchies and interconnects to support synchronous parallelism across processing elements, addressing latency issues inherent in off-chip model storage on GPUs. By mid-2017, this work culminated in the announcement of the Colossus GC2, Graphcore's inaugural IPU—a 16 nm device with 1,472 independent processor tiles delivering mixed-precision floating-point operations at scale. Concurrently, the team co-developed the Poplar software stack to facilitate model mapping onto the hardware, ensuring prototypes could demonstrate end-to-end AI acceleration.[2][17][19]Core Technology
Intelligence Processing Unit Architecture
The Intelligence Processing Unit (IPU) employs a massively parallel, MIMD architecture comprising thousands of independent processing tiles, each integrating compute and memory to minimize data movement latency inherent in traditional von Neumann designs.[20] Unlike GPUs, which rely on hierarchical caches and global DRAM, the IPU distributes on-chip SRAM directly within tiles, enabling explicit, high-bandwidth data exchange without implicit caching overhead.[21] This tile-based structure supports Bulk Synchronous Parallel (BSP) execution, sequencing compute phases with collective synchronization and exchange operations across the fabric.[20] Each tile features a single multi-threaded processor core capable of running up to six worker threads alongside a supervisor thread for orchestration, with vectorized floating-point units and dedicated matrix multiply engines delivering 64 multiply-accumulate operations per cycle in half-precision.[20] In the second-generation IPU (GC200), the chip integrates 1,472 such tiles, providing nearly 9,000 parallel threads and 900 MB of aggregate In-Processor-Memory (SRAM) at 624 KB per tile, yielding aggregate bandwidths exceeding 45 TB/s for local access with latencies around 3.75 ns at 1.6 GHz clock speeds.[3] First-generation IPUs (MK1) featured 1,216 tiles with 304 MiB total SRAM, scaling performance to 124.5 TFLOPS in mixed precision.[21] The IPU's exchange hierarchy facilitates all-to-all communication via an on-chip torus interconnect with 7.7 TB/s throughput and sub-microsecond latencies for operations like gathers (0.8 µs across the IPU), enabling efficient handling of irregular, graph-like data flows common in AI models.[21] Off-tile scaling occurs through IPU-Links (64 GB/s bidirectional) and host interfaces, supporting multi-IPU clusters without relying on PCIe bottlenecks.[20] This contrasts with GPU SIMT models, where thread divergence and memory coalescing limit efficiency on non-uniform workloads; IPUs excel in fine-grained parallelism and small-batch inference by partitioning models across tiles with explicit messaging, achieving up to 3-4x speedups over GPUs in graph neural networks.[21]Key Innovations in Parallel Processing
Graphcore's Intelligence Processing Unit (IPU) introduces a tile-based massively parallel architecture optimized for machine intelligence workloads, featuring 1,472 independent processing tiles per second-generation (MK2) IPU, each capable of executing multiple threads.[3][20] This design enables nearly 9,000 concurrent independent program threads, supporting a Multiple Instruction, Multiple Data (MIMD) execution model where tiles operate with autonomous control flows, contrasting with the more rigid SIMD paradigms in traditional GPUs.[3][20] A core innovation lies in the Bulk Synchronous Parallel (BSP) programming model, which structures computation into discrete phases of local tile processing, global synchronization, and inter-tile data exchange via an on-chip all-to-all fabric.[20] This approach minimizes synchronization overhead in highly parallel AI tasks, such as graph-based computations, by enforcing synchronous execution across all tiles per step while allowing round-robin thread scheduling within tiles to hide latencies.[20] Complementing this, each tile integrates local SRAM (624 KB per tile, totaling approximately 900 MB of In-Processor-Memory across the IPU), which colocates compute and data to drastically reduce memory access bottlenecks inherent in von Neumann architectures.[3][20] Further enhancements include specialized hardware for vectorized floating-point operations (e.g., FP16 and FP32 with matrix multiply-accumulate units performing 64 operations per cycle) and high-bandwidth collective communication primitives, enabling efficient scaling to pod-level systems interconnecting up to 64,000 IPUs.[20][3] Microbenchmarking reveals that this parallelism yields superior throughput for irregular, data-intensive workloads like deep learning inference, though performance is bounded by exchange fabric contention under unbalanced loads.[21] These elements collectively address the parallelism demands of large-scale models by prioritizing fine-grained, graph-oriented computation over sequential bottlenecks.[21][20]Software Stack and Ecosystem
Graphcore's software stack is anchored by the Poplar SDK, a comprehensive toolchain co-designed with the Intelligence Processing Unit (IPU) to facilitate graph-based programming for machine intelligence workloads. Released as the world's first dedicated framework for IPU graph software, Poplar encompasses a graph compiler, runtime environment, and supporting libraries that map computational graphs onto IPU tiles, enabling fine-grained parallelism across thousands of processing elements.[22][23] Developers can program directly in C++ or Python, expressing algorithms as directed acyclic graphs that leverage IPU-specific features like in-memory computation and bulk synchronous parallelism.[22] The SDK integrates with established machine learning frameworks to broaden accessibility. It provides IPU-enabled backends for PyTorch (including PyTorch Geometric for graph neural networks) and TensorFlow/Keras, allowing users to train and infer models with minimal code modifications via directives like@ipu_model. PopART, a core component, supports ONNX import/export for model portability, while Poplibs deliver optimized, low-level operations such as tensor manipulations and custom kernels.[23][24] These integrations have been updated iteratively, with Poplar SDK 3.1 (December 2022) adding PyTorch 1.13 support and enhanced sparse tensor handling.[24]
Complementary tools enhance development and optimization. PopVision suite includes the Graph Analyser for visualizing IPU graph execution, tile-level performance metrics, and memory usage, alongside the System Analyser for host-IPU interaction profiling. These enable debugging of large-scale models distributed across IPU-POD systems.[25] The stack supports containerized environments through Docker Hub images, certified under Docker's Verified Publisher Program since November 2021, facilitating reproducible deployments.[26]
The ecosystem fosters scalability via third-party integrations and community resources. Partnerships, such as UbiOps' IPU support added in July 2023, enable dynamic scaling of training jobs in cloud-like setups. Open-source contributions on GitHub, including Poplibs for reusable primitives, encourage custom extensions, though adoption has been critiqued for demanding expert-level tuning to achieve peak efficiency compared to GPU alternatives.[27][28][29] Post-SoftBank acquisition in 2024, the stack remains centered on Poplar, with ongoing emphasis on large-model support like efficient fine-tuning of billion-parameter transformers.[30]
Products and Hardware Offerings
IPU Generations and Evolution
Graphcore's first-generation Intelligence Processing Unit (IPU), prototyped in 2016 and commercially launched in 2018, introduced a novel massively parallel architecture designed specifically for AI workloads, featuring thousands of independent processing tiles interconnected via a custom mesh to handle entire machine learning models in on-chip memory, eschewing the data movement bottlenecks of traditional GPUs. This initial design emphasized synchronous parallelism across 1,472 tiles, each with multiple cores, enabling high throughput for graph-based computations central to deep learning. In July 2020, Graphcore unveiled its second-generation IPU, embodied in the IPU-M2000 processor and integrated into systems like the IPU-Machine, which quadrupled on-chip memory to 900 MB per IPU and boosted compute density through refined tile interconnects and enhanced bulk memory management, delivering up to 250 teraFLOPS of 16-bit floating-point performance per unit while supporting scalable pods for exascale AI training.[31] These advancements addressed limitations in the first generation by improving scalability for large models, with each IPU-Machine housing four IPUs connected via 100 GbE fabric for distributed processing, marking a shift toward production-scale deployments in data centers. The evolution culminated in the Bow IPU, announced in March 2022 and entering shipment shortly thereafter, which applied TSMC's 3D wafer-on-wafer bonding to stack the second-generation GC200 die face-to-face with a dedicated power-delivery die, enabling 40% higher clock speeds, reduced power consumption, and denser integration without redesigning the underlying processor logic.[32] Bow systems, such as the Bow Pod with four IPUs aggregating 5,888 cores and 1.4 petaFLOPS of AI compute, extended the architecture's efficiency for hyperscale applications, though adoption remained constrained by ecosystem maturity relative to GPU incumbents.[33] This packaging innovation represented Graphcore's focus on incremental hardware refinements amid competitive pressures, prior to its 2024 acquisition by SoftBank, which redirected resources toward integrated AI infrastructure rather than standalone generational leaps.[34]Scale-Up Systems like Colossus
Graphcore's scale-up systems, exemplified by configurations like the Colossus IPU clusters, enable datacenter-scale deployment of Intelligence Processing Units (IPUs) through rack-integrated IPU-POD architectures designed for efficient AI model training and inference. Introduced in December 2018, the initial rackscale IPU-POD utilized first-generation Colossus Mk1 IPUs to deliver over 16 petaFLOPS of mixed-precision compute per 42U rack, with systems of 32 such pods scaling to more than 0.5 exaFLOPS.[35] These systems leverage IPU-Link interconnects for low-latency, high-bandwidth communication, minimizing data movement overhead compared to traditional GPU clusters reliant on PCIe or NVLink.[35] The second-generation systems, launched in July 2020, advanced scalability with the IPU-Machine M2000—a 1U appliance housing four Colossus Mk2 GC200 IPUs, providing 1 petaFLOP of AI compute, up to 900 MB of in-processor memory per IPU, and support for up to 450 GB of exchange memory with 180 TB/s bandwidth.[31] Rack-scale examples include the IPU-POD64, comprising 16 M2000 units for 64 IPUs, and the IPU-POD128 with 32 M2000 units for 128 IPUs, 8.2 TB of total memory, and enhanced scale-out via 100 GbE fabrics.[31][36] These configurations support disaggregated host-to-IPU ratios, allowing flexible integration with standard servers from partners like Dell and HPE, and extend to datacenter-scale clusters of up to 64,000 IPUs.[31][37] Key features of these scale-up systems emphasize massive parallelism for large models, with first-generation Colossus Mk1 supporting up to 4,096 IPUs and optimized topologies for graph-based workloads via the Poplar software stack.[38] Power efficiency is highlighted in configurations like 16 Mk2 IPUs delivering 4 petaFLOPS at 7 kW in a 4U unit, though real-world deployment depends on cooling and interconnect density.[39] By 2021, expanded POD designs like POD128 facilitated training of models exceeding GPT-scale, with bandwidth exceeding 10 PB/s in projected ultra-scale systems.[36][40]Integration with Cloud and Software Tools
Graphcore's Poplar SDK serves as the primary software interface for its Intelligence Processing Units (IPUs), enabling seamless integration with popular machine learning frameworks such as TensorFlow (versions 1 and 2, with full support for TensorFlow XLA compilation) and PyTorch.[22] This co-designed stack facilitates efficient mapping of computational graphs to IPU hardware, supporting features like in-processor memory streaming and parallel execution optimized for AI workloads.[23] Developers can access pre-optimized models and datasets through partnerships, including Hugging Face's Transformers library adapted for IPU acceleration as of May 2022.[41] Containerization support enhances deployment flexibility, with official Poplar SDK images available on Docker Hub since November 2021, verified under Docker's Publisher Program.[42] These images include tools for interacting with IPUs and running applications in isolated environments. Kubernetes integration is provided for orchestration in scale-up systems like IPU-PODs, allowing automated provisioning and management of IPU clusters alongside frameworks such as Slurm and OpenStack. Additional ecosystem expansions, such as UbiOps platform support added in July 2023, enable dynamic scaling of IPU jobs for training and inference.[27] For cloud deployment, Graphcore IPUs have been accessible via Microsoft Azure since at least 2020, permitting users to provision IPU instances without on-premises hardware.[38] The company launched its own G-Core Labs IPU Cloud service in June 2022, bundling Poplar SDK access for rapid prototyping and production-scale AI tasks.[43] Partnerships with infrastructure providers like Atos for high-performance computing solutions and Pure Storage for data management further extend IPU usability in hybrid cloud environments, though adoption has remained limited compared to GPU-centric alternatives.[44][4]Funding Trajectory and Financial Challenges
Major Investment Rounds
Graphcore secured its Series B funding round of $30 million on July 20, 2017, led by Atomico with participation from investors including Samsung Catalyst Fund, Dell Technologies Capital, Amadeus Capital Partners, Foundation Capital, Pitango Venture Capital, C4 Ventures, and Robert Bosch Venture Capital.[45] This round supported the development of its Intelligence Processing Unit (IPU) technology for machine learning applications.[45] The company followed with a Series C round of $50 million in November 2017, led by Sequoia Capital and including Dell as a participant. In December 2018, Graphcore closed a $200 million Series D round, achieving unicorn status with a post-money valuation of $1.7 billion; key investors included Microsoft, BMW i Ventures, Sofina, Merian Global Investors (now Chrysalis Investments), and Draper Esprit.[46] [2] This funding accelerated IPU production scaling and partnerships for AI hardware deployment.[46] Graphcore extended its Series D with an additional $150 million raised on February 25, 2020, from investors including Baillie Gifford, Mayfair Equity Partners, and Chrysalis Investments, bringing the total for the round to approximately $350 million and elevating the valuation to $1.95 billion.[47] The final major venture round was Series E, closing at $222 million on December 29, 2020, led by the Ontario Teachers' Pension Plan with support from Schroders, Fidelity International, and existing backers, resulting in a $2.77 billion valuation.[48] Across these rounds from 2017 to 2020, Graphcore raised over $700 million in total equity funding to fuel R&D and market expansion amid competition in AI accelerators.[49]Revenue Realities Versus Valuation Hype
Graphcore's valuation surged amid the AI hardware boom, reaching a post-money valuation of $2.77 billion in December 2020 following a $222 million funding round led by Fidelity and others, positioning it as a high-profile challenger to Nvidia in specialized AI processing.[50] This peak reflected investor enthusiasm for its Intelligence Processing Unit (IPU) technology, with earlier rounds including a $200 million Series D in 2018 that elevated it to unicorn status at approximately $1.7 billion.[51] However, these valuations were driven more by speculative promise than operational traction, as the company invested heavily in R&D and scaling without commensurate commercial uptake. In stark contrast, Graphcore's revenue remained negligible relative to its funding and hype. For the year ended December 31, 2022—the most recent full-year figures publicly available pre-acquisition—revenue totaled just $2.7 million, a 46% decline from 2021, amid broader market challenges in AI chip adoption beyond dominant GPU ecosystems.[52] Pre-tax losses ballooned to $205 million that year, reflecting high operational burn rates from a workforce of around 500 and expansive hardware development, with cash reserves strained despite over $700 million raised cumulatively.[53] These figures underscored a core disconnect: while Graphcore marketed IPUs as superior for certain machine learning workloads via massive on-chip memory and parallelism, customer inertia toward established Nvidia CUDA software stacks limited deployments, resulting in revenue that equated to mere fractions of a percent of its valuation.[54] The valuation-revenue mismatch culminated in SoftBank's 2024 acquisition for an estimated $500-600 million—less than a quarter of the 2020 peak—effectively a down-round that wiped out significant investor gains and highlighted over-optimism in early-stage AI hardware bets.[55] Pre-acquisition filings revealed ongoing struggles to convert pilot programs into scalable sales, with revenue growth stymied by ecosystem lock-in and competition, prompting headcount reductions of over 20% by late 2022.[52] This trajectory exemplifies how venture capital in AI semiconductors often prioritized technological novelty over proven market fit, leading to hype-fueled multiples unsupported by fundamentals.Acquisition and Strategic Shifts
SoftBank Takeover in 2024
On July 11, 2024, SoftBank Group Corp. announced the acquisition of Graphcore, the UK-based developer of Intelligence Processing Units (IPUs) for AI workloads, converting it into a wholly owned subsidiary.[56] [57] The financial terms were not officially disclosed, though reports indicated a purchase price ranging from approximately $400 million to over $600 million, a sharp decline from Graphcore's peak valuation of $2.8 billion in 2020.[55] [58] [59] This transaction followed months of speculation, as Graphcore had been seeking buyers since at least February 2024 amid competitive pressures in the AI chip market dominated by Nvidia and ongoing financial strains, including just $4 million in revenue for 2023 despite over $700 million in cumulative investments.[58] [60] Graphcore's CEO Nigel Toon described the deal as a "positive outcome" that would enable accelerated development of next-generation AI compute infrastructure under SoftBank's resources, emphasizing continuity in operations and integration with SoftBank's broader AI ambitions, including synergies with its Arm Holdings subsidiary.[61] SoftBank, led by Masayoshi Son, positioned the acquisition as part of its strategic push toward artificial general intelligence (AGI), leveraging Graphcore's IPU technology for scalable AI training and inference systems.[52] The move marked SoftBank's second major UK semiconductor investment, following its 2016 purchase of Arm for $32 billion, and reflected a pattern of acquiring distressed AI hardware innovators to bolster its ecosystem amid global chip shortages and escalating demand for alternatives to GPU-centric architectures.[62] The acquisition faced no major regulatory hurdles and closed promptly, with Graphcore retaining its Bristol headquarters and commitment to UK-based R&D, though it highlighted broader challenges for European AI startups in scaling against US incumbents.[63] [54] Industry analysts noted that while Graphcore's MIMD-based IPUs offered theoretical advantages in certain parallel processing tasks over Nvidia's SIMD GPUs, persistent ecosystem lock-in and slower market adoption had eroded its standalone viability, making SoftBank's deep pockets essential for survival.[64]Post-Acquisition Expansions and Plans
Following its acquisition by SoftBank Group Corp. on July 11, 2024, Graphcore announced intentions to expand hiring in the United Kingdom and globally to bolster its engineering and research capabilities.[65] [66] This included a renewed recruitment drive starting in November 2024, targeting roles in AI hardware development and software optimization to align with SoftBank's broader artificial intelligence infrastructure goals.[66] A key post-acquisition initiative materialized in October 2025, when Graphcore, as a SoftBank subsidiary, committed £1 billion (approximately $1.3 billion) to infrastructure development in India over the next decade.[65] [6] The investment focuses on scaling AI chip research and development, including the establishment of an AI Engineering Campus in Bengaluru as Graphcore's first office in the country.[67] [68] This expansion aims to create up to 500 semiconductor-related jobs, emphasizing design, fabrication support, and integration of Intelligence Processing Units (IPUs) for AI workloads.[69] [6] The India plans integrate with SoftBank's global AI compute strategy, which includes multi-trillion-dollar commitments to advanced computing resources, positioning Graphcore's IPU technology as a complementary asset to GPU-dominant ecosystems.[68] No further large-scale geographic expansions or product roadmap shifts have been publicly detailed as of October 2025, though the acquisition has enabled Graphcore to leverage SoftBank's resources for sustained R&D amid prior commercial challenges.[69]Competitive Landscape
Rivalry with Nvidia and GPU Dominance
Graphcore positioned its Intelligence Processing Units (IPUs) as a direct architectural alternative to Nvidia's graphics processing units (GPUs), emphasizing massive on-chip memory (up to 900 MB SRAM per IPU) and fine-grained parallelism tailored for AI training and inference, contrasting with Nvidia's reliance on high-bandwidth memory (HBM) and tensor cores.[70] In benchmarks published by Graphcore in December 2020, the IPU-M2000 system (four MK2 IPUs) demonstrated up to 60x higher throughput and 16x lower latency than a single Nvidia A100 GPU in specific low-latency AI tasks, such as BERT inference.[71] Independent evaluations, including a 2021 arXiv study on cosmological simulations, showed mixed results: Graphcore's MK1 IPU outperformed Nvidia's V100 GPU in some deep neural network training scenarios but lagged in others due to software immaturity.[72] These claims highlighted potential IPU advantages in memory-bound workloads, yet Graphcore's self-reported metrics often compared multi-IPU clusters to single GPUs, drawing skepticism over apples-to-oranges equivalency.[73] Nvidia maintained overwhelming dominance in the AI accelerator market, capturing an estimated 86% share of AI GPU deployments by 2025, driven by its CUDA software ecosystem that locked in developers through optimized libraries, vast community support, and seamless integration with frameworks like TensorFlow and PyTorch.[74] This moat proved insurmountable for Graphcore, whose Poplar SDK required significant porting efforts from CUDA codebases, limiting adoption among enterprises reliant on Nvidia's mature tooling and supply chain scale.[75] By 2023-2024, Graphcore's revenue remained under $100 million annually despite $700 million in funding, contrasting Nvidia's trillions in market cap fueled by AI demand, as customers prioritized ecosystem compatibility over raw hardware specs.[76] The rivalry underscored GPU dominance as a barrier to IPU penetration: while Graphcore targeted niches like sparse models or edge inference with claims of 11x better price-performance versus Nvidia's DGX A100 in 2020 announcements, real-world scalability issues and Nvidia's iterative GPU advancements (e.g., H100's tensor performance leaps) eroded these edges.[77] Post-2024 SoftBank acquisition, Graphcore pivoted toward hybrid IPU-GPU integrations, implicitly acknowledging Nvidia's entrenched position rather than outright displacement.[70] This dynamic reflected broader causal factors in AI hardware: software inertia and network effects favored incumbents, rendering even superior architectures secondary without equivalent developer mindshare.[78]Performance Benchmarks and Claims
Graphcore has asserted superior performance for its Intelligence Processing Units (IPUs) in specific AI workloads, particularly those benefiting from massive parallelism and sparsity handling via MIMD architecture. In December 2020, the company claimed its IPU-M2000 system delivered up to 18x higher training throughput and 600x inference throughput over Nvidia A100 GPUs in select models like BERT and ResNet-50, based on in-house optimizations with Poplar SDK.[71] These assertions emphasized IPU advantages in memory bandwidth and tile-based processing for irregular computations, contrasting Nvidia's SIMT GPU approach.[79] Participation in standardized MLPerf training benchmarks provided more verifiable data. In MLPerf v1.1 (December 2021), Graphcore reported the fastest single-server BERT time-to-train at 10.6 minutes using an IPU-POD system, while its IPU-POD16 achieved 28.3 minutes for ResNet-50, surpassing Nvidia DGX A100's 29.1 minutes by 24%—attributed to software refinements in Poplar and PopART frameworks.[80] Earlier, in MLPerf v1.0 (June 2021), results were less favorable, with Graphcore's ResNet-50 time at 32.12 minutes versus Nvidia's 28.77 minutes on DGX A100.[81]| MLPerf Benchmark | Graphcore Configuration | Graphcore Time-to-Train | Nvidia DGX A100 Time-to-Train | Notes |
|---|---|---|---|---|
| ResNet-50 (v1.0) | IPU-POD (unspecified scale) | 32.12 minutes | 28.77 minutes | Closed division; Nvidia faster despite similar power envelopes.[81] [29] |
| ResNet-50 (v1.1) | IPU-POD16 | 28.3 minutes | 29.1 minutes | 24% edge for Graphcore via software gains; single-server closed.[80] |
| BERT (v1.1) | IPU-POD (single-server) | 10.6 minutes | Not directly compared (Nvidia multi-node faster overall) | Graphcore's claimed fastest single-server result.[80] [82] |