Fact-checked by Grok 2 weeks ago

Nvidia DGX

The NVIDIA DGX platform is a unified ecosystem of purpose-built AI supercomputing systems, integrating high-performance NVIDIA GPUs, scalable infrastructure, and optimized software to enable enterprise-grade artificial intelligence development, training, and deployment across on-premises, cloud, and hybrid environments. Launched in April 2016 with the DGX-1, the world's first deep learning supercomputer, the platform was designed to deliver the computational power equivalent to 250 traditional x86 servers through eight interconnected Tesla P100 GPUs based on the Pascal architecture, along with a full suite of deep learning software including DIGITS, cuDNN, and frameworks like Caffe and Torch. This pioneering system marked NVIDIA's entry into turnkey AI hardware, accelerating the training of complex neural networks and setting the foundation for modern AI scaling laws. Over the years, the lineup has expanded significantly; in 2017, NVIDIA introduced Volta-based DGX systems to advance AI research with enhanced performance. Subsequent iterations, such as the DGX A100 in 2020, incorporated Ampere architecture for broader AI workloads including analytics and inference. Today, the DGX platform encompasses a range of systems tailored to different scales and needs, including the DGX SuperPOD for multi-user, leadership-class and clusters that power supercomputers; the DGX BasePOD as a proven reference for scalable deployments; and enterprise-focused models like the DGX H100/H200 (with eight or H200 Tensor Core GPUs for universal infrastructure) and DGX B200 (a unified platform for factories supporting develop-to-deploy pipelines). For individual developers and researchers, personal options include the DGX Spark—a Grace Blackwell-powered desktop with up to 200 billion parameter model support and 128 GB unified memory—and the DGX Station, offering high-performance training on a workstation scale. Central to the platform's efficacy is its optimization for NVIDIA AI Enterprise software, which streamlines workflows, provides pretrained models, and facilitates production deployment, while tools like NVIDIA Mission Control enable full-stack management of operations. Adopted by 8 of the top 10 global telecommunications companies, 7 of the top 10 pharmaceutical firms, and 10 of the top 10 automotive manufacturers, DGX systems drive innovations in industries such as , autonomous vehicles, and smart cities, consistently achieving records in MLPerf benchmarks and contributing to energy-efficient on the list.

Overview

Definition and Role in AI

The NVIDIA DGX platform is an integrated hardware-software system designed for , , , and (HPC) workloads, combining multiple GPUs with optimized networking and storage to deliver turnkey AI supercomputing capabilities. As an enterprise-grade solution, it provides a unified ecosystem that accelerates pipelines and facilitates the development and deployment of production AI applications, enabling organizations to scale from individual systems to large clusters without custom integration. In the broader AI landscape, DGX plays a pivotal role by empowering enterprises to construct "AI factories"—dedicated infrastructures for generating and refining models at scale. It supports critical applications such as generative for , drug discovery through accelerated simulations (adopted by 7 of the top 10 global pharmaceutical companies), autonomous vehicle development (utilized by 10 of the top 10 car manufacturers), and climate modeling for high-resolution weather predictions and environmental simulations. As of 2025, the platform has seen widespread adoption, including by 8 of the top 10 global telecommunications companies for network optimization and -driven services. NVIDIA's evolution from a (GPU) manufacturer focused on and to a dominant force in underscores the centrality of DGX as its flagship product for scalable, enterprise deployments. Originally pioneering through , NVIDIA shifted toward with the introduction of DGX systems, transforming raw GPU power into comprehensive platforms that drive industry-wide innovation. DGX systems offer performance ranging from 1 petaFLOP in compact models like for developer workflows to exaFLOPS in clustered configurations such as DGX SuperPOD, powering some of the world's most advanced supercomputers and setting records in benchmarks.

Key Architectural Principles

The Nvidia DGX systems embody a unified architecture that tightly integrates multiple GPUs, high-performance CPUs, high-speed interconnects such as , and high-capacity storage within a single , seamless and low-latency data movement across components. This design facilitates direct GPU-to-GPU communication at aggregate bandwidths exceeding hundreds of gigabytes per second, minimizing bottlenecks in data transfer and supporting efficient for workloads. By co-locating these elements, DGX eliminates the need for external cabling in intra-node operations, reducing latency to sub-microsecond levels and enhancing overall system coherence. Scalability in DGX architectures spans from compact single-node configurations, such as the desk-side DGX Station, to expansive rack-scale deployments via the DGX SuperPOD framework, which can interconnect thousands of GPUs across clusters. This modular approach employs scalable units that allow incremental expansion without redesign, supporting from a few GPUs for development to over 9,000 GPUs in production environments for large-scale training. High-speed fabrics like within nodes and or Ethernet between racks ensure linear performance scaling, preserving efficiency as system size grows. Power efficiency and form factors in DGX systems prioritize dense compute in varied environments, ranging from compact units like DGX Spark to full racks, with advanced cooling solutions to manage high thermal loads. Air and liquid cooling mechanisms, including direct liquid cooling on compute trays, capture up to 90% of GPU , enabling sustained high-density operations while minimizing per computation. These designs support power options from configurations, achieving up to 25 times better energy efficiency in AI compared to prior generations through optimized integration. AI-optimized features in DGX include support for Multi-Instance GPU () partitioning, which securely divides a single GPU into multiple isolated instances for concurrent workloads, alongside Tensor Cores that accelerate matrix operations essential for . Coherent access between CPU and GPU, enabled by technologies like NVLink-C2C, allows unified address spaces and direct , boosting productivity by enabling GPUs to access vast CPU pools without explicit transfers. These elements collectively streamline AI development, from model to , by prioritizing hardware-software synergy for , high-throughput processing.

History

Inception and Early Models

The Nvidia DGX-1 was announced on April 5, 2016, at the GPU Technology Conference, marking the debut of the world's first purpose-built supercomputer designed to accelerate research and development. Priced at $129,000, the system integrated eight P100 GPUs with high-speed interconnects, dual processors, and substantial memory and storage, all optimized for training in a compact 3U suitable for data centers. Targeted primarily at academic researchers, enterprises, and AI startups, the DGX-1 aimed to democratize access to supercomputing-scale performance for tasks without the need for custom-built clusters. Early adoption of the DGX-1 was swift among leading AI organizations, with CEO personally delivering the first unit to in 2016 to support their pioneering work in advanced AI models. The system enabled efficient training of large-scale neural networks comparable to on datasets like , leveraging its parallel processing capabilities to achieve significant speedups in model development. Initial deployments emphasized integration with CUDA-optimized frameworks such as and Caffe, which pre-configured in the DGX software stack to streamline setup and maximize multi-GPU efficiency for researchers transitioning from CPU-based workflows. By late 2016, dozens of units had shipped to early customers, fostering rapid experimentation in fields like and . In May 2017, Nvidia expanded the DGX lineup with the introduction of the DGX Station at the GPU Technology Conference, positioning it as the first personal for individual teams and small labs. Featuring four Tesla V100 GPUs in a deskside, liquid-cooled enclosure, the DGX Station delivered high-performance capabilities—equivalent to hundreds of CPUs—while operating quietly in office environments without requiring infrastructure. Priced for accessibility and shipping in the third quarter of 2017, it targeted developers needing plug-and-play prototyping, further broadening DGX's reach beyond enterprise-scale deployments. The launch of DGX systems catalyzed Nvidia's strategic pivot from its historical focus on GPUs to establishing dominance in hardware, with the DGX-1's rapid uptake—reaching shipments to nearly 100 organizations by the end of —underscoring growing demand for dedicated infrastructure. This shift propelled Nvidia's revenue, enabling the company to lead the market in GPU-accelerated computing and influencing the broader ecosystem's adoption of supercomputing for commercial applications.

Advancements in GPU Architectures

Advancements within the architecture continued with the DGX-2 in 2018, which doubled the GPU count to 16 Tesla V100 accelerators compared to the eight in the DGX-1, enabling 2 petaFLOPS of performance through enhanced interconnects for multi-GPU scaling. The transition from to marked a significant evolution in DGX systems, beginning with the introduction of the A100 GPU in 2020 in the DGX A100, which incorporated Multi-Instance GPU () partitioning, allowing a single GPU to be divided into up to seven isolated instances for improved resource utilization in multi-tenant environments. Additionally, 's third-generation Tensor Cores supported Tensor Float-32 (TF32) precision, delivering up to 20 times faster for large transformer models relative to the V100's FP32 performance, while maintaining numerical accuracy comparable to FP32. The architecture, debuting with the GPU in 2022, introduced the Transformer Engine, a specialized hardware-software co-design that optimizes FP8 precision for transformer-based models, achieving up to 9 times faster training and 30 times faster on large language models compared to the A100. This era continued with the H200 in 2024, which upgraded to 141 GB of HBM3e memory per GPU—nearly double the H100's capacity—enabling the handling of larger models with over 100 billion parameters without excessive sharding, while boosting to 4.8 TB/s for sustained throughput in workloads. Advancements extended into integrated CPU-GPU designs with the GH200 Superchip in 2023, which paired a 72-core -based Grace CPU with an GPU via a 900 GB/s NVLink-C2C interconnect, providing coherent memory access and up to 10 times higher bandwidth than traditional PCIe-based systems for and HPC tasks. The followed in 2024 with the GB200 Superchip, featuring 208 billion transistors across dual GPU dies and delivering up to 30 times faster real-time for large language models relative to equivalent configurations, driven by fifth-generation Tensor Cores and enhanced FP4/FP8 support. Culminating these shifts, the DGX Spark, announced in October 2025, offers a compact Grace Blackwell system powered by the GB10 Superchip, integrating 20 cores with a Blackwell GPU to deliver 1 petaFLOP of performance in a form factor for developer prototyping. Over this progression, DGX performance scaled from 170 TFLOPS in the DGX-1 to exaFLOP-level clusters, such as those formed by 1,024 GH200 Superchips, while architectural innovations emphasized energy efficiency, including the H100's 700 W TDP that balanced high compute density with reduced power per operation in workloads.

DGX Systems

Pascal and Volta Systems

The DGX-1, introduced in 2016, marked the debut of the DGX series as a rack-mountable optimized for workloads. It integrated eight P100 GPUs based on the Pascal architecture, providing a total of 128 GB of HBM2 GPU memory. The system featured dual 20-core E5-2698 v4 CPUs, 512 GB of DDR4-2133 system memory, and storage configured as four 1.92 TB SSDs in 0 for approximately 7.68 TB of capacity. With a peak FP64 performance of 37.6 TFLOPS across the GPUs, the DGX-1 delivered substantial computational power for its era, consuming up to 3,500 W of power in a 3U . This configuration enabled efficient initial training tasks by leveraging interconnects for high-speed GPU communication. Building on the DGX-1, the NVIDIA DGX Station launched in 2017 as a compact, liquid-cooled tower designed for desktop deployment in small research teams. It housed four NVIDIA V100 GPUs utilizing the architecture, offering 64 GB of total HBM2 GPU memory. The system included a single 20-core E5-2698 v4 CPU and 256 GB of DDR4 system memory, with options for upgrades to 512 GB in later configurations. Targeted at prototyping and development, the DGX Station provided 480 TFLOPS of FP16 performance in a desk-friendly enclosure weighing about 88 pounds, facilitating rapid iteration on AI models without the need for infrastructure. The DGX-2, released in 2018, advanced the series with a dual-node design incorporating 16 NVIDIA Tesla V100 GPUs, yielding 512 GB of total HBM2 GPU memory. It utilized 12 NVSwitches to achieve aggregate 2.0 bandwidth exceeding 1 PB/s across the system, enabling seamless multi-GPU scaling. Powered by dual 24-core Platinum 8168 CPUs and 1.5 TB of DDR4 system memory, the DGX-2 delivered up to 2 petaFLOPS of FP16 tensor performance and 120 TFLOPS of FP64, while drawing a maximum of 10 kW. This setup supported handling large-scale datasets in a single 8U weighing 350 pounds. These Pascal and Volta-based systems excelled in early deep learning benchmarks, such as training on ImageNet-1K, which could be completed in as little as two hours on a single DGX-1. They facilitated multi-GPU scaling for complex models on extensive datasets, accelerating research in and . However, their high power consumption, exemplified by the DGX-2's 10 kW draw, posed challenges for deployment in power-constrained environments, often requiring dedicated cooling and electrical infrastructure.

Ampere Systems

The NVIDIA DGX A100 server, introduced in , represents a pivotal advancement in infrastructure, featuring eight NVIDIA A100 Tensor Core GPUs with options for 40 or 80 of HBM2e memory per GPU, providing up to 640 total GPU memory. The system includes 2 TB of system memory, dual AMD EPYC 7742 CPUs with 128 cores total, and 15 TB of NVMe storage, delivering 5 petaFLOPS of FP16 performance when leveraging sparsity acceleration. This configuration enables scalable and for large-scale models, with NVSwitch interconnects ensuring high-bandwidth GPU-to-GPU communication at 600 /s. Complementing the server, the DGX Station A100 workstation, launched in 2021, offers a more compact for individual or small-team use, equipped with four A100 GPUs providing up to 320 GB of HBM2e memory, 512 GB of system memory, and a single 7742 CPU with 64 cores. Its tower design supports air cooling and includes NVMe storage options, making it suitable for on-premises development without dedicated infrastructure. A key feature is support for Multi-Instance GPU (MIG) partitioning, allowing the system to accommodate up to eight concurrent users by dividing each GPU into isolated instances for efficient resource sharing in multi-tenant environments. Ampere-based DGX systems introduced significant enhancements in AI efficiency, including the first implementation of TF32 precision alongside FP64, FP32, FP16, and INT8 support, enabling seamless multi-precision computing without code modifications. These systems achieve significant enhancements in AI efficiency, including up to 20x faster Tensor Core performance in TF32 precision with sparsity compared to V100 FP32 operations, enabling up to 6x faster AI training tasks such as compared to V100-based systems, primarily through structured sparsity that doubles Tensor Core throughput by zero-value computations. This sparsity acceleration, combined with improved exceeding 2 TB/s per GPU, optimizes utilization for sparse models prevalent in modern workloads. The DGX A100 platforms gained widespread adoption in urgent scientific applications, particularly during the , where systems deployed to institutions like accelerated research into treatments, vaccines, and virus transmission modeling. For instance, protein structure prediction workflows were expedited on DGX A100 hardware, enabling faster analysis of viral proteins and supporting efforts by generating accurate 3D models in hours rather than days.

Hopper Systems

The Hopper-based DGX systems mark a pivotal evolution in NVIDIA's infrastructure, emphasizing optimizations for large language models (LLMs) through the introduction of FP8 precision support and the Transformer Engine, which accelerate and by enabling dynamic mixed-precision computations tailored to transformer architectures. These systems leverage the Hopper GPU architecture's fourth-generation Tensor Cores to deliver substantial performance gains over prior generations, particularly in handling models with hundreds of billions to trillions of parameters, while maintaining compatibility with NVIDIA's interconnects for seamless multi-GPU scaling. The DGX H100 Server, launched in 2022, serves as the foundational Hopper system, integrating eight Tensor Core GPUs, each equipped with 80 GB of HBM3 memory for a total of 640 GB GPU memory. It features dual Xeon Platinum 8480C CPUs with 112 cores total, 2 TB of DDR5 system memory across 32 DIMMs, and approximately 30 TB of high-performance NVMe storage (including 8 × 3.84 TB U.2 SSDs in 0 for data caching). Connected via fourth-generation switches providing 900 GB/s bidirectional GPU-to-GPU bandwidth, the DGX H100 achieves 32 petaFLOPS of FP8 performance, enabling efficient training of LLMs that require massive parallel compute. Building on this, the DGX H200, released in 2024, enhances memory capacity for workloads with eight H200 Tensor Core GPUs, each offering 141 GB of HBM3e memory and 4.8 TB/s bandwidth, resulting in 1.128 TB total GPU memory. The system retains the dual CPUs and 2 TB system memory of the H100 but delivers up to 2× faster inference throughput for such as 2 70B compared to the H100, attributed to the increased memory allowing larger batch sizes and reduced data movement overhead in trillion-parameter model deployments. The DGX GH200, announced in 2023, introduces a CPU-GPU superchip design pairing one NVIDIA Grace CPU (with 480 GB LPDDR5X ) and one H100 GPU (with 80 GB HBM3 ) per superchip, interconnected via NVLink-C2C at 900 GB/s bidirectional bandwidth for unified . Configurations scale to clusters of up to 256 superchips, forming a single coherent GPU domain with 1 exaFLOP of AI performance and 144 TB shared , optimized for memory-bound LLM training. The NVIDIA Helios supercomputer, powered by four DGX GH200 nodes and interconnected with Quantum-2 , became operational in 2024 to support internal R&D on GPT-scale models and other generative AI applications. These Hopper systems excel in use cases like trillion-parameter LLMs, where their high-bandwidth and integrated architectures minimize latency in transformer-based computations, facilitating breakthroughs in generative while providing a robust platform for enterprise-scale deployments.

Blackwell Systems

The NVIDIA DGX B200, introduced in 2024, is a featuring eight NVIDIA B200 GPUs with 192 GB of HBM3e each, providing a total of 1.536 TB of high-bandwidth GPU across the with aggregate exceeding 5 TB/s per GPU. Integrated with dual Intel Platinum 8570 CPUs (112 cores total) and 2 TB of DDR5 , this configuration delivers up to 40 petaFLOPS of FP8 performance, optimized for and on large-scale models, including support for clusters handling 405 billion parameter language models. Available in air-cooled or liquid-cooled 10U form factors, the DGX B200 emphasizes seamless scaling through fifth-generation interconnects, facilitating deployment in data centers for generative workloads. The NVIDIA DGX GB200, also introduced in , represents a rack-scale system built around the Blackwell architecture and Grace Blackwell Superchips. The GB200 NVL72 configuration integrates 72 Blackwell GPUs and 36 Grace CPUs in a single liquid-cooled , delivering up to 1.4 exaFLOPS of FP8 performance with 13.4 TB of total HBM3e memory and 130 TB/s of low-latency GPU-to-GPU communication via fifth-generation . This setup enables rapid deployment of scalable units for enterprise infrastructure, supporting agile orchestration of trillion-parameter models. Building on this, the DGX SuperPOD received a significant Blackwell update in 2025, evolving into a modular, rack-scale reference architecture for AI factories with configurations like the GB200 NVL72, comprising multiple DGX GB200 nodes. Designed for enterprise AI infrastructure, the updated SuperPOD supports agile orchestration of trillion-parameter models and serves as a blueprint for hyperscale AI operations, as demonstrated in deployments by organizations like SoftBank and as of July 2025. In October 2025, launched the DGX , a compact system powered by the GB10 Blackwell Superchip, which combines a 20-core CPU (10 Cortex-X925 performance cores and 10 Cortex-A725 efficiency cores) with a Blackwell GPU and 128 GB of unified LPDDR5X memory for coherent access across components. Delivering 1 petaFLOP of performance, the DGX enables local on models up to 200 billion parameters without cloud dependency, targeting developers and scenarios in a 150 x 150 x 51 mm with 10 GbE networking and NVLink-C2C interconnects. Key innovations in Blackwell-based DGX systems include up to 5x faster training on select MLPerf benchmarks compared to Hopper architectures, driven by advancements in tensor core efficiency and precision scaling, alongside ecosystem expansions such as the Ascent GX10—a partner variant of DGX that leverages the same GB10 Superchip for 1 petaFLOP FP4 performance in a -focused setup. These enhancements prioritize integrated CPU-GPU designs for edge-to-cloud workflows, with liquid cooling and software optimizations reducing energy demands while boosting throughput for next-generation reasoning.

Software and Ecosystem

Core Software Stack

The core software stack of NVIDIA DGX systems comprises a suite of proprietary and open-source components optimized for and workloads, enabling efficient development, training, and deployment of models. This stack is built around NVIDIA DGX OS, a customized distribution of Ubuntu Linux that integrates GPU-accelerated libraries and frameworks, providing a unified environment for enterprise workflows. NVIDIA AI Enterprise forms the certified foundation of the stack, offering a comprehensive suite of tools, libraries, and containers designed for production-grade . It includes optimized components such as TensorRT for high-performance , for accelerated data analytics and on GPUs, and NeMo for end-to-end generative model and customization. The suite supports popular frameworks including , , and , ensuring seamless integration and portability across DGX hardware. At the core are and cuDNN libraries, which provide essential GPU acceleration for and deep neural networks. CUDA enables general-purpose computing on GPUs, while cuDNN delivers optimized primitives for convolutional and recurrent neural networks. In DGX OS 7.3, released in October 2025 and based on Ubuntu 24.04, these libraries are aligned with the latest compatible versions, such as and cuDNN 9.7.0, to support advanced training and inference tasks. DGX systems come preloaded with tools from the NVIDIA GPU Cloud (NGC), including optimized containers for frameworks and pre-trained models such as from . These containers facilitate full-stack workflows, encompassing data analytics, model training, visualization, and deployment, allowing users to rapidly prototype and scale applications without manual configuration. The stack emphasizes compatibility and security, with CUDA's forward and ensuring support for legacy models and applications across GPU generations. Additionally, security features like , available on and Blackwell-based DGX systems, protect sensitive AI models and data in use through hardware-enforced memory encryption and isolation.

Deployment and Management

NVIDIA Base Command Manager serves as a comprehensive tool for managing and (HPC) clusters, enabling automated provisioning, job scheduling, and real-time monitoring of DGX systems across on-premises, , and hybrid cloud environments. It integrates with and Slurm for workload orchestration, allowing enterprises to maximize GPU utilization and streamline infrastructure operations in multi-node DGX deployments. By providing centralized oversight of resources, including DGX clusters, Base Command facilitates efficient scaling and reduces deployment complexity for workflows. For organizations seeking flexible, on-demand access to DGX resources without extensive on-premises infrastructure, NVIDIA DGX offers a service model delivered through certified partners such as CoreWeave, enabling DGX-as-a-Service for bursty and workloads. This platform supports seamless integration with NVIDIA's Enterprise software stack, allowing users to provision scalable GPU clusters in the while maintaining and compliance requirements. Partners like CoreWeave provide dedicated capacity, exemplified by multi-billion-dollar agreements to ensure high-performance compute availability for enterprise applications. In 2025, introduced Mission Control as an advanced management layer for factory operations, automating resource allocation, , and performance optimization across DGX-based infrastructures to achieve hyperscale . This update enables proactive monitoring of system health, reducing downtime through -driven alerts and dynamic workload balancing in large-scale deployments. Mission Control integrates with DGX ecosystems to support end-to-end operations, from experimentation to production-scale . Best practices for DGX deployment emphasize scalable architectures, starting with DGX BasePOD for storage-optimized, ready-to-deploy configurations that simplify initial setup and integrate tools for enterprise . For larger operations, scaling to DGX SuperPOD provides leadership-class performance, incorporating validated networking and storage designs to handle exascale training while adhering to deployment guides for power, cooling, and cabling efficiency. Hybrid on-premises and cloud setups, facilitated by tools like Base Command, ensure compliance with data regulations by combining local control with elastic cloud bursting, minimizing latency for sensitive workloads.

Hardware Components

Accelerators

The accelerators in Nvidia DGX systems form the core of their computational power, evolving through successive GPU architectures to deliver escalating performance for and workloads. The initial Pascal-based Tesla P100 GPU, introduced in 2016, featured 16 GB of HBM2 and delivered 10.6 TFLOPS of FP32 performance, marking a significant step in high-bandwidth integration for acceleration. Subsequent generations built on this foundation, with the architecture's Tesla V100 GPU doubling to 32 GB HBM2 while introducing 640 Tensor Cores to accelerate mixed-precision matrix operations essential for . Ampere architecture advanced further with the , offering up to 80 GB HBM2e memory and introducing Multi-Instance GPU () technology, which partitions a single GPU into isolated instances for secure multi-tenant environments. The enhanced this lineage with 80 GB HBM3 memory (configurable to 94 GB in select variants) and native FP8 precision support in its fourth-generation Tensor Cores, enabling up to 4x faster compared to prior generations. The latest Blackwell B200 GPU scales to 192 GB HBM3e memory and incorporates FP4 precision, targeting exascale inference with dramatically reduced latency. In August 2025, NVIDIA introduced the Blackwell Ultra variant, offering enhanced performance in the GB200 Superchip with up to 40 PFLOPS sparse FP4 Tensor Core performance and improved energy efficiency for next-generation workloads. Nvidia's superchip designs integrate these GPUs with Arm-based CPUs via high-speed NVLink-C2C interconnects, unifying pools for seamless CPU-GPU collaboration. The GH200 Grace Hopper Superchip pairs a 72-core CPU with an GPU, achieving 900 GB/s bidirectional bandwidth over NVLink-C2C to eliminate traditional bottlenecks in data transfer. Similarly, the GB200 Grace Blackwell Superchip connects a CPU to two B200 GPUs, providing unified access to 864 GB of coherent (480 GB LPDDR5X on the Grace CPU and 384 GB HBM3e on the two B200 GPUs) optimized for workloads like , enhancing data analytics efficiency. Performance in these accelerators is quantified through floating-point operations per second (FLOPS), with peak values derived from core counts, clock speeds, and precision modes. A basic approximation for FP16 throughput on earlier architectures like Pascal is given by: \text{Peak FP16 (TFLOPS)} = \frac{\text{CUDA cores} \times \text{clock speed (GHz)} \times 2}{1000} This formula accounts for the doubling of operations in half-precision relative to single-precision, though actual peaks incorporate Tensor Core contributions. For the H100, detailed Tensor Core FP16 performance (with FP32 accumulation) reaches 989 TFLOPS in dense mode and 1979 TFLOPS with sparsity exploitation, where structured sparsity prunes 50% of weights without accuracy loss, effectively doubling throughput. Key innovations in these accelerators center on Tensor Cores, specialized hardware for matrix multiply-accumulate (MMA) operations that form the backbone of training and inference. Introduced in , Tensor Cores perform 4x4x4 MMA in mixed (e.g., FP16 input, FP32 accumulation), delivering up to 125 TFLOPS for on V100. Successive generations evolved this with third-generation support in for sparse MMA, fourth-generation FP8 in , and fifth-generation FP4 in Blackwell. Programmers access these via the Warp Matrix Multiply-Accumulate (WMMA) API in , enabling custom kernels for batched operations on warps, as demonstrated in early implementations achieving 4 TFLOPS on V100 for half-precision matrix multiplies. This API abstracts low-level PTX instructions, facilitating portable acceleration across DGX systems while preserving control.

Interconnects and Storage

The Nvidia DGX systems employ high-speed interconnects to facilitate efficient data transfer between GPUs and across clusters, enabling scalable workloads. Within a single DGX node, fifth-generation provides up to 1.8 TB/s bidirectional per GPU in Blackwell-based systems like the DGX B200, allowing seamless all-to-all communication among the eight GPUs via switches that deliver 14.4 TB/s aggregate throughput. In Grace Hopper configurations such as the DGX GH200, -C2C interconnects achieve 900 GB/s bidirectional between the Grace CPU and Hopper GPU, enhancing data movement for memory-intensive tasks. For inter-node and cluster-scale connectivity, DGX systems integrate NVIDIA NDR networks operating at 400 Gb/s per port, as utilized in DGX SuperPOD architectures for low-latency, high-throughput scaling to exaflop performance. Ethernet options at 400 Gb/s support cloud deployments, while Quantum-2 networking links multiple DGX GH200 nodes in clusters like , providing robust RDMA capabilities for distributed training. In large-scale SuperPOD environments, these interconnects enable aggregate bandwidth exceeding 1 PB/s across multi-rack domains, such as in the GB200 NVL72 configuration with 576 GPUs. Storage in DGX systems features integrated NVMe SSDs for high-performance local caching and booting, with configurations like eight 3.84 TB U.2 NVMe drives in the DGX H200, totaling approximately 30 TB in setups managed via software like for redundancy and speed. For larger-scale persistence, DGX BasePOD integrates external parallel filesystems such as Lustre, supporting petabyte-scale deployments with throughput up to hundreds of /s to feed AI pipelines without I/O bottlenecks. To handle the thermal demands of these dense interconnects and storage components, Blackwell-era DGX systems like the GB200 incorporate liquid cooling, dissipating from high-power elements while maintaining operational efficiency in -scale setups drawing up to 120 kW per . This approach reduces water consumption compared to air-cooled predecessors and supports sustained high-bandwidth operations in SuperPOD clusters.

References

  1. [1]
    DGX Platform: Built for Enterprise AI - NVIDIA
    Optimized to run NVIDIA AI Enterprise software, it accelerates data science pipelines and streamlines the development and deployment of production-grade AI ...Get DGX · DGX SuperPOD · DGX BasePOD
  2. [2]
    NVIDIA Launches World's First Deep Learning Supercomputer
    Apr 5, 2016 · The NVIDIA DGX-1 deep learning system is built on NVIDIA Tesla® P100 GPUs, based on the new NVIDIA Pascal™ GPU architecture.Missing: history | Show results with:history
  3. [3]
    NVIDIA DGX Spark Arrives for World's AI Developers
    Oct 13, 2025 · “DGX-1 launched the era of AI supercomputers and unlocked the scaling laws that drive modern AI. With DGX Spark, we return to that mission — ...
  4. [4]
    NVIDIA Advances AI Computing Revolution with New Volta-Based ...
    May 10, 2017 · NVIDIA today announced a new lineup of NVIDIA® DGX AI supercomputers with unmatched computing performance to advance the world's most challenging AI research.
  5. [5]
    Introduction to the NVIDIA DGX A100 System
    Oct 1, 2025 · The NVIDIA DGX™ A100 System is the universal system purpose-built for all AI infrastructure and workloads, from analytics to training to ...
  6. [6]
    Introduction to NVIDIA DGX H100/H200 Systems
    Sep 10, 2025 · The NVIDIA DGX™ H100/H200 Systems are the universal systems purpose-built for all AI infrastructure and workloads from analytics to training to ...
  7. [7]
    NVIDIA DGX B200 - The foundation for your AI factory.
    NVIDIA DGX™ B200 is a unified AI platform for develop-to-deploy pipelines for businesses of any size at any stage in their AI journey.<|control11|><|separator|>
  8. [8]
    NVIDIA DGX Spark
    NVIDIA DGX Spark offers an exceptional platform for developing robotics, smart city, and computer vision solutions. NVIDIA frameworks include Isaac, Metropolis, ...
  9. [9]
    DGX Station | Experience AI Performance on Your Desktop - NVIDIA
    NVIDIA DGX Station: Powered by the GB300 Grace Blackwell Superchip, 784GB memory, and CUDA X-AI for unmatched AI training and inferencing at your desktop.Missing: lineup | Show results with:lineup
  10. [10]
    NVIDIA DGX H200 & B200: Enterprise AI Systems - Megware
    NVIDIA DGX represents a revolutionary class of AI supercomputers specifically developed for the most demanding workloads in artificial intelligence. These ...
  11. [11]
    Autonomous Vehicle & Self-Driving Car Technology from NVIDIA
    NVIDIA DGX delivers a high-performance AI training compute platform designed to accelerate autonomous vehicle development. Developers can further reduce ...Partners · Automotive News · Mercedes-Benz · DRIVE VideosMissing: climate | Show results with:climate
  12. [12]
    AI-Powered Climate and Weather Simulation Platform | NVIDIA Earth-2
    The Earth-2 accelerated systems will let climate scientists produce kilometer (km)-scale climate simulations, conduct large-scale AI training and inference, and ...Data Federation And... · Earth-2 Ai Stack · Leading Adopters Across...
  13. [13]
    NVIDIA's $4 Trillion Journey to AI Leadership - Thomasnet
    Jul 31, 2025 · In 2016, NVIDIA introduced DGX-1, an integrated software and hardware system primarily geared towards enhancing deep learning applications. In ...
  14. [14]
    Nvidia company history & timeline: From GPU maker to AI leader
    Jun 20, 2024 · Here's the full story of how Nvidia grew from a small graphics processing company into the multitrillion-dollar tech and AI powerhouse it is today.
  15. [15]
    DGX SuperPOD: AI Infrastructure for Enterprise Deployments | NVIDIA
    NVIDIA DGX SuperPOD provides leadership-class AI infrastructure with agile, scalable performance for the most challenging AI training and inference workloads.
  16. [16]
    SoftBank Corp. Builds World's Largest NVIDIA DGX SuperPOD with ...
    Jul 23, 2025 · In October 2024, SoftBank completed a further deployment of over 4,000 NVIDIA Hopper GPUs, expanding performance to 4.7 Exaflops *3 in total.
  17. [17]
    NVLink & NVSwitch: Fastest HPC Data Center Platform | NVIDIA
    NVLink is a direct GPU-to-GPU interconnect, and NVLink Switch connects multiple NVLinks for all-to-all GPU communication, enabling high-speed data transfer.Maximize System Throughput... · Raise Reasoning Throughput... · Nvidia Nvlink Fusion
  18. [18]
    [PDF] NVIDIA DGX-1 with Tesla V100 System Architecture White paper
    NVLink is an energy-efficient, high-bandwidth interconnect that enables NVIDIA GPUs to connect to peer. GPUs or other devices within a node at an aggregate ...
  19. [19]
    Overview — NVIDIA IMEX Service for NVLink Networks
    Jul 2, 2025 · At the core of every NVIDIA® DGX™ and NVIDIA HGX™ system is NVIDIA NVLink™-connected GPUs that access each other's memory at NVLink speed.
  20. [20]
    DGX SuperPOD Architecture - NVIDIA Docs
    Sep 3, 2025 · DGX SuperPOD can scale to much larger configurations up to and beyond 128 racks with 9216 GPUs. Contact your NVIDIA representative for ...
  21. [21]
    Networking — NVIDIA DGX GB200 User Guide
    Aug 29, 2025 · The DGX GB200 uses a hybrid approach with NVLink for within-rack, InfiniBand for between-rack, and Ethernet for storage and management. It has ...
  22. [22]
    Hardware — NVIDIA DGX GB200 User Guide
    Aug 29, 2025 · The compute trays are cooled by liquid that runs up and down the rack through manifolds, then through the cold plates that are attached to the ...
  23. [23]
    [PDF] NVIDIA DGX Station
    In addition to the four GPUs, DGX Station™ includes one 20-core. CPU, fast local storage (3 SSDs configured in RAID 0), a water-cooling system for the GPUs,.
  24. [24]
    An AI Factory for AI Reasoning NVIDIA DGX B300
    Explore the new features and capabilities, including AC and DC power options, that make DGX B300 easy to integrate into any modern data center, with greater ...Accelerating Ai For Every... · Nvidia Dgx B300... · Delivering Supercomputing To...
  25. [25]
    NVIDIA Ampere Architecture In-Depth | NVIDIA Technical Blog
    May 14, 2020 · The new Multi-Instance GPU (MIG) feature allows the A100 Tensor Core GPU to be securely partitioned into as many as seven separate GPU ...
  26. [26]
    [PDF] NVIDIA A100 Tensor Core GPU Architecture
    The new Multi-Instance GPU (MIG) feature allows the A100 Tensor Core GPU to be securely ... To feed the Tensor Cores, A100 implements a 5-site HBM2 memory.
  27. [27]
    NVIDIA Grace Hopper Superchip Architecture In-Depth
    Nov 10, 2022 · NVLink-C2C memory coherency increases developer productivity and performance and enables GPUs to access large amounts of memory.CPU and GPU ...
  28. [28]
    The Engine Behind AI Factories | NVIDIA Blackwell Architecture
    NVIDIA Blackwell Ultra Tensor Cores are supercharged with 2X the attention-layer acceleration and 1.5X more AI compute FLOPS compared to NVIDIA Blackwell GPUs.
  29. [29]
    NVIDIA announces a supercomputer aimed at deep learning and AI
    Apr 5, 2016 · You're in for some sticker shock, though: the DGX-1 costs $129,000. No one said the future was going to be cheap! Techcrunch event ...
  30. [30]
    Nvidia Enters Supercomputer Market With New Chips, Systems
    Apr 5, 2016 · The $129,000 computer, called the DGX-1, was built using the company's Tesla P100 processor and will be sold to operators of data centers ...
  31. [31]
    NVIDIA, OpenAI Announce 'Biggest AI Infrastructure Deployment in ...
    Sep 22, 2025 · In 2016, NVIDIA CEO Jensen Huang hand-delivered the first NVIDIA DGX system to OpenAI's headquarters in San Francisco. The first gigawatt of ...<|separator|>
  32. [32]
    [PDF] The NVIDIA DGX-1 Deep Learning System
    NVIDIA DGX-1 removes the burden of continually optimizing your deep learning software and delivers a ready-to-use, optimized software stack that can save you.
  33. [33]
  34. [34]
    [PDF] NVIDIA DGX Station
    NVIDIA® DGX Station™ (Figure 1) is the world's first personal supercomputer for leading-edge AI development. DGX Station features four NVIDIA® Tesla® V100 ...
  35. [35]
    The Pint-Sized Supercomputer That Companies Are Scrambling to Get
    Dec 14, 2016 · Fewer than 100 companies and organizations have bought DGX-1s since they started shipping in the fall, but early adopters say Nvidia's claims ...
  36. [36]
    Nvidia: How the chipmaker evolved from a gaming startup to an AI ...
    Jul 9, 2025 · Here's a look at Nvidia's path to where it is today, from creating hardware for the gaming industry to designing the chips that power AI.
  37. [37]
    [PDF] NVIDIA DGX-2 Datasheet
    SYSTEM SPECIFICATIONS. GPUs. 16X NVIDIA® Tesla® V100. GPU Memory. 512GB total. Performance. 2 petaFLOPS. NVIDIA CUDA® Cores. 81920. NVIDIA Tensor Cores.
  38. [38]
    NVIDIA Hopper Architecture In-Depth | NVIDIA Technical Blog
    Mar 22, 2022 · A new transformer engine enables H100 to deliver up to 9x faster AI training and up to 30x faster AI. inference speedups on large language ...Introducing The Nvidia H100... · H100 Sm Architecture · H100 Gpu Hierarchy And...
  39. [39]
    nvidia h200 gpu
    Based on the NVIDIA Hopper™ architecture, the NVIDIA H200 is the first GPU to offer 141 gigabytes (GB) of HBM3e memory at 4.8 terabytes per second (TB/s) — ...The Gpu For Generative Ai... · Higher Performance With... · Enterprise-Ready: Ai...
  40. [40]
    NVIDIA GH200 Grace Hopper Superchip
    The NVIDIA GH200 Grace Hopper™ Superchip is a breakthrough processor designed from the ground up for giant-scale AI and high-performance computing (HPC) ...Performance · Scientific Compute · Data Processing
  41. [41]
    [PDF] The NVIDIA DGX-1 Deep Learning System
    NVIDIA DGX-1. Performance in teraFLOPS. CPU-Only Server. 170 TFLOPS. 5 TFLOPS. NVIDIA DGX-1 Delivers 34X More Performance. NVIDIA DGX-1 Delivers 58X Faster ...
  42. [42]
    Nvidia to Offer a '1 Exaflops' AI Supercomputer with 256 Grace ...
    May 28, 2023 · The system connects four DGX GH200 systems – for a total of 1,024 Grace Hopper Superchips – using Nvidia's Quantum-2 InfiniBand networking.
  43. [43]
    H100 GPU - NVIDIA
    H100 features fourth-generation Tensor Cores and a Transformer Engine with FP8 precision that provides up to 4X faster training over the prior generation ...Transformational Ai Training · Real-Time Deep Learning... · Exascale High-Performance...Missing: 9x | Show results with:9x
  44. [44]
    [PDF] nvidia dgx a100 - the universal system for ai infrastructure
    May 20, 2025 · CPU. Dual AMD Rome 7742,. 128 cores total, 2.25 GHz. (base), 3.4 GHz (max boost). System Memory. 1TB. Networking. 8x Single-Port Mellanox.
  45. [45]
    DGX Station A100 Hardware Specifications - NVIDIA Docs
    Oct 1, 2025 · CPU. 1. Single AMD 7742, 64 cores, 2.25 GHz (base)–3.4 GHz (max boost), and 2.25 GHz (base)–3.4 GHz (max boost) ; GPU- current units. 4. 4x ...<|separator|>
  46. [46]
    NVIDIA A100 Tensor Core GPU
    Specifications ; INT8 Tensor Core, 624 TOPS | 1248 TOPS* ; GPU Memory, 80GB HBM2e, 80GB HBM2e ; GPU Memory Bandwidth, 1,935 GB/s, 2,039 GB/s ; Max Thermal Design ...
  47. [47]
  48. [48]
    Boost Alphafold2 Protein Structure Prediction with GPU-Accelerated ...
    Nov 13, 2024 · With MMseqs2-GPU, an updated GPU-accelerated library for evolutionary information retrieval, getting insights from protein sequences is faster than ever.
  49. [49]
    NVIDIA Announces DGX H100 Systems – World's Most Advanced ...
    Mar 22, 2022 · Packing eight NVIDIA H100 GPUs per system, connected as one by NVIDIA NVLink®, each DGX H100 provides 32 petaflops of AI performance at new FP8 ...<|control11|><|separator|>
  50. [50]
    NVIDIA Announces DGX GH200 AI Supercomputer
    nearly 500x more memory than the previous generation NVIDIA DGX A100 ...
  51. [51]
    DGX GB200: AI Infrastructure for State-of-the-Art AI Models | NVIDIA
    NVIDIA DGX GB200 Specifications ; CPU Cores, 2,592 Arm® Neoverse V2 cores ; GPU Memory | Bandwidth, Up to 13.4 TB HBM3e | 576 TB/s ; Total Fast Memory, 30.2 TB.
  52. [52]
    NVIDIA Blackwell Ultra DGX SuperPOD Delivers Out-of-the-Box AI ...
    Mar 18, 2025 · NVIDIA Blackwell Ultra DGX SuperPOD delivers out-of-the-box AI supercomputer for enterprises to build AI factories.
  53. [53]
    NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf ...
    Jun 4, 2025 · Blackwell delivered 2.25x higher performance per GPU on the GNN benchmark compared to Hopper. MLPerf Training v5.0 Closed. Results retried on ...<|separator|>
  54. [54]
    ASUS Ascent GX10|Desktop AI supercomputer
    Desktop personal AI supercomputer with 1petaflop performance. Powered by NVIDIA GB10 Grace Blackwell from DGX Spark. Compact, efficient for AI developers.
  55. [55]
    NVIDIA DGX OS 7 User Guide
    DGX OS 7 is a customized Ubuntu Linux with optimizations for AI, based on Ubuntu 24.04, and includes access to NVIDIA GPU drivers.Missing: cuDNN | Show results with:cuDNN<|separator|>
  56. [56]
    NVIDIA AI Enterprise | Cloud-native Software Platform
    NVIDIA AI Enterprise is a suite of software tools, libraries and containers to develop and operate AI in production. Developed to provide performance ...
  57. [57]
    Infrastructure Support Matrix — NVIDIA AI Enterprise
    NVIDIA AI Enterprise is supported on NVIDIA DGX servers in bare-metal deployments with the NVIDIA data center driver for Linux, which is included in the DGX OS ...
  58. [58]
    NVIDIA AI Enterprise Documentation
    Oct 3, 2024 · The PyTorch framework enables you to develop deep learning models with flexibility. With the PyTorch framework, you can make full use of Python ...Missing: DGX JAX
  59. [59]
    Release Notes — NVIDIA DGX OS 7 User Guide
    The DGX OS ISO 7.3. · OS base: Ubuntu 24.04. · Introduces support for the NVIDIA DGX™ B300 system. · NVIDIA GPU drivers: · Updated NVIDIA® BlueField®-3 DPU in NIC ...Missing: details | Show results with:details
  60. [60]
    NVIDIA DGX OS 7 User Guide
    cuDNN. 9.7.0 ; DCGM. 4.1.0 ; GPUDirect Storage (GDS). 1.13.0 for CUDA Toolkit 12.8. 1.11.1 for CUDA Toolkit 12.6 Update 3. 1.11 for CUDA Toolkit 12.6. 1.10 for ...
  61. [61]
    NVIDIA NGC
    NVIDIA NGC™ is the portal of enterprise services, software, management tools, and support for end-to-end AI and digital twin workflows.NVIDIA AI Foundation Models · AI and HPC Containers · NGC Software Partners
  62. [62]
    Llama-3.1-8b-Instruct-DGX-Spark - NGC Catalog
    Oct 9, 2025 · This container houses the Llama 3.1 8B Instruct NIM for DGX Spark, which is an 8 billion parameter, instruction-tuned large language model ...Missing: MLOps | Show results with:MLOps
  63. [63]
    Certified MLOps Software for NVIDIA DGX Systems
    The NVIDIA DGX™-Ready Software program features enterprise-grade MLOps solutions that accelerate AI workflows and improve deployment, accessibility, ...Missing: Llama | Show results with:Llama
  64. [64]
    Upgrading Nvidia DGX packages did not update CUDA version
    Feb 16, 2023 · The NGC CUDA containers are an excellent way to have a repeatable environment, and use the forward and backward compatibility of CUDA regardless ...
  65. [65]
    AI Security with Confidential Computing - NVIDIA
    NVIDIA Confidential Computing preserves the confidentiality and integrity of AI models and algorithms that are deployed on Hopper and Blackwell GPUs.
  66. [66]
    AI & HPC Cluster Management Software - Base Command - NVIDIA
    Manage AI and HPC clusters with NVIDIA Base Command Manager. Automate provisioning, deploy fast, and scale across edge, data center, and hybrid cloud.
  67. [67]
    NVIDIA Base Command Manager
    NVIDIA Base Command Manager streamlines cluster provisioning, workload management, and infrastructure monitoring.
  68. [68]
    Base Command | Operating System of the DGX Data Center - NVIDIA
    NVIDIA Base Command is the operating system of the DGX data center, providing AI workflow management, cluster management, and optimized system software.
  69. [69]
    The Ultimate AI Experience in the Cloud | NVIDIA DGX Cloud
    Using NVIDIA DGX Cloud, it took Amgen less than a month to go from onboarding to their first pretrained protein LLM. Read Amgen's Success Story. Ecosystem ...
  70. [70]
    Support for NVIDIA AI Enterprise Software Platform ... - CoreWeave
    Mar 18, 2025 · CoreWeave will support NVIDIA AI enterprise software platforms, as well as NVIDIA Cloud Functions, to help ensure continued high performance ...
  71. [71]
    NVIDIA Announces DGX Cloud Lepton to Connect Developers to ...
    May 19, 2025 · NVIDIA Announces DGX Cloud Lepton to Connect Developers to NVIDIA's Global Compute Ecosystem · CoreWeave, Crusoe, Firmus, Foxconn, GMI Cloud, ...
  72. [72]
    Run Models for AI Factories |Mission Control - NVIDIA
    Mission Control lets every enterprise run AI with hyperscale-grade efficiency so you can accelerate AI experimentation. Automating AI Factory Operations.
  73. [73]
    Introduction — NVIDIA DGX BasePOD: Deployment Guide Featuring ...
    DGX BasePOD is a prescriptive AI infrastructure for enterprises, eliminating the design challenges, lengthy deployment cycle, and management complexity.Missing: best practices SuperPOD hybrid
  74. [74]
    Deployment Guide Featuring NVIDIA DGX A100 and DGX H100 ...
    Dec 11, 2024 · The NVIDIA DGX SuperPOD: Deployment Guide Featuring NVIDIA DGX A100 and DGX H100 Systems is also available as a PDF.
  75. [75]
    [PDF] NVIDIa® Tesla® P100 - GPU aCCeleRaTOR
    The Tesla P100 is an advanced data center accelerator with Pascal architecture, 16GB memory, 10.6 TeraFLOPS single-precision, and 5.3 TeraFLOPS double- ...
  76. [76]
    [PDF] NVIDIA TESLA V100 GPU ACCELERATOR
    TENSOR CORE​​ Equipped with 640 Tensor Cores, Tesla V100 delivers 125 teraFLOPS of deep learning performance. That's 12X Tensor FLOPS for DL Training, and 6X ...
  77. [77]
    [PDF] NVIDIA A100 | Tensor Core GPU
    NVIDIA A100 delivers 312. teraFLOPS (TFLOPS) of deep learning performance. That's 20X the Tensor floating-point operations per second. (FLOPS) for deep learning ...<|separator|>
  78. [78]
    NVIDIA Blackwell Platform Arrives to Power a New Era of Computing
    Mar 18, 2024 · The GB200 NVL72 provides up to a 30x performance increase compared to the same number of NVIDIA H100 Tensor Core GPUs for LLM inference ...
  79. [79]
    [PDF] NVIDIA V100 Tensor Core GPU
    Equipped with 640 Tensor Cores, V100 delivers 130 teraFLOPS (TFLOPS) of deep learning performance. That's 12X Tensor FLOPS for deep learning training, and 6X ...
  80. [80]
    Programming Tensor Cores in CUDA 9 | NVIDIA Technical Blog
    Oct 17, 2017 · Here's an example that shows how you can use the WMMA (Warp Matrix Multiply Accumulate) API to perform a matrix multiplication. This example is ...Missing: innovations | Show results with:innovations
  81. [81]
    Introduction to NVIDIA DGX B200 Systems
    Sep 10, 2025 · The NVIDIA DGX B200 is a universal AI system with 8 B200 GPUs, 2 Intel Xeon CPUs, 14.4 TB/s NVLink switches, and 1,440 GB GPU memory.
  82. [82]
    Grace Hopper Superchip - NVIDIA Docs
    Jun 6, 2025 · High CPU Core Density: The Arm-based Grace CPU provides up to 72 cores per GPU, reducing CPU bottlenecks in hybrid workloads. Example ...
  83. [83]
    [PDF] Reference Architecture | NVIDIA DGX SuperPOD: Next Generation ...
    Apr 14, 2025 · The NDR generation InfiniBand, NDR, has a peak speed of 400 Gbps per direction with an extremely low port-to-port latency, and is backwards ...
  84. [84]
    Live From Taipei: NVIDIA CEO Unveils Gen AI Platforms for Every ...
    May 28, 2023 · It will use four DGX GH200 systems linked with NVIDIA Quantum-2 InfiniBand networking to supercharge data throughput for training large AI ...5g/6g Calls For Grace Hopper · Turbocharging Cloud Networks · Accelerating Gen Ai On...
  85. [85]
    NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and ...
    Mar 18, 2024 · The GB200 compute tray delivers 80 petaflops of AI performance and 1.7 TB of fast memory. A GB200 compute node is shown exposing the two ...Nvidia Gb200 Nvl36 And Nvl72 · Use Cases And Performance... · Physics-Based Simulations
  86. [86]
    DGX H200: AI for Enterprise - NVIDIA
    System Power Usage, ~10.2kW max ; CPU, Dual Intel® Xeon® Platinum 8480C Processors 112 cores total, 2.00GHz (Base), 3.80GHz (max boost) ; System Memory, 2TB.
  87. [87]
    Storage - DGX Best Practices :: DGX Systems Documentation
    Mar 11, 2022 · The DGX-2 has 8 or 16 3.84 TB NVMe drives that are managed by the OS using mdadm (software RAID). On systems with 8 NVMe drives, you can add an ...Missing: DAOS | Show results with:DAOS
  88. [88]
    DGX BasePOD Overview - NVIDIA Docs
    Aug 20, 2025 · DGX BasePOD is an integrated solution consisting of NVIDIA hardware and software components, MLOps solutions, and third-party storage.Missing: Lustre | Show results with:Lustre
  89. [89]
    Nvidia AI supercomputer shows its Lustre in Oracle cloud
    May 2, 2023 · Nvidia is running its AI supercomputer on Oracle's cloud infrastructure with its Lustre file system relying on NVMe block access SSDs.
  90. [90]
    GB200 NVL72 | NVIDIA
    The NVIDIA GB200 NVL72 delivers 30X faster real-time large language model (LLM) inference, supercharges AI training, and delivers breakthrough performance.Unlocking Real-Time... · Supercharging... · Technological Breakthroughs