Fact-checked by Grok 2 weeks ago

GPU virtualization

GPU virtualization is a computing technique that enables the partitioning and sharing of a single physical graphics processing unit (GPU) among multiple virtual machines (VMs) within a virtualized environment, allowing each VM to access a portion of the GPU's computational resources as if it had dedicated hardware. This approach addresses the challenges of GPU underutilization in cloud and data center settings by providing isolation, scalability, and efficient resource allocation for parallel processing workloads.^[1]^[2] The development of GPU virtualization emerged in the mid-2000s alongside the rise of general-purpose GPU (GPGPU) computing, with early research focusing on enabling GPU acceleration in virtualized systems to support high-performance computing (HPC) applications. Initial efforts, such as those documented in 2008, introduced taxonomies for virtualization strategies, including API remoting—where GPU commands are forwarded over a network to a remote physical GPU—and direct device pass-through, which assigns an entire GPU to a single VM for near-native performance but limits sharing. By the early 2010s, mediated pass-through techniques like gVirt (introduced in 2014) advanced the field by combining device emulation in software with hardware isolation to support multiple VMs per GPU while running native drivers inside VMs.^[3]^[4]^[2] Major implementations have been driven by industry leaders, with NVIDIA's virtual GPU (vGPU) software, first released around 2013 as part of the GRID platform, providing hardware-accelerated sharing for virtual desktops, workstations, and applications across hypervisors like VMware vSphere and KVM. NVIDIA vGPU profiles allocate specific fractions of GPU memory and compute cores to VMs, supporting use cases in AI training, graphics-intensive design, and virtual desktop infrastructure (VDI). Similarly, AMD's MxGPU and partition-based virtualization, integrated into products like the Versal AI Edge Series, divide GPU shaders into isolated slices and partitions, ensuring secure multi-VM access via hardware arbiters and memory management units for embedded and edge computing scenarios. These solutions emphasize time-slicing, spatial partitioning, and fine-grained scheduling to balance performance and fairness.^[5]^[1]^[6] Key benefits of GPU virtualization include cost reduction through hardware consolidation, enhanced security via VM isolation to prevent interference, and improved scalability for cloud providers offering GPU-accelerated instances. However, challenges persist, such as overhead from context switching, ensuring equitable resource distribution among VMs, and maintaining low-latency performance for real-time graphics. Ongoing research explores hybrid approaches, including hardware-assisted virtualization compliant with standards like PCI-SIG SR-IOV, to further optimize for emerging demands in machine learning and remote rendering. As of 2025, NVIDIA has released vGPU versions 18 and 19, enhancing support for AI workloads including LLM fine-tuning in virtualized environments.^[2]^[1]^[6]^[7]

Introduction

Definition and Principles

GPU virtualization is the process of abstracting a physical graphics processing unit (GPU) to enable multiple virtual machines (VMs) or containers to share its resources for graphics rendering or general-purpose computing tasks, without providing direct hardware access to any individual instance.^[8] This abstraction allows a single physical GPU to be partitioned into multiple virtual GPUs (vGPUs), each appearing as a dedicated device to the guest environment, thereby facilitating efficient resource allocation in shared computing setups.^[8] The core principles of GPU virtualization revolve around resource isolation, scheduling, and balancing performance trade-offs. Resource isolation ensures that workloads from different VMs do not interfere with each other, maintaining security and stability by preventing unauthorized access to shared hardware components like memory and processing cores.^[8] Scheduling mechanisms, such as time-slicing (where the GPU alternates execution between vGPUs over short intervals) or spatial partitioning (dividing the GPU into concurrent sub-units), manage access to optimize utilization while minimizing latency.^[8] These principles involve inherent trade-offs, where sharing efficiency gains come at the cost of overhead compared to native GPU performance, typically resulting in a 3-10% reduction in throughput depending on the workload and virtualization technique.^[9] In the basic architecture, a host-level GPU driver acts as a mediator, intercepting and routing commands from guest VMs to the physical hardware while enforcing isolation policies.^[8] Guest environments interact with virtual GPUs through paravirtualized interfaces (requiring guest driver modifications for awareness of the virtualization layer) or fully virtualized interfaces (emulating a complete GPU without guest changes), enabling seamless integration with hypervisors like KVM or VMware.^[8] GPUs play essential roles in both graphics rendering and general-purpose computing, necessitating virtualization to address resource inefficiencies. For graphics, APIs such as OpenGL—a cross-platform standard for high-performance 2D and 3D rendering—and DirectX—Microsoft's suite for hardware-accelerated 2D/3D graphics in multimedia applications—rely on GPUs to process vertex transformations, shading, and rasterization for real-time visuals in games and simulations.^[10]^[11] In general-purpose GPU (GPGPU) computing, frameworks like CUDA (NVIDIA's parallel computing platform) and OpenCL (an open standard for heterogeneous parallel programming) offload compute-intensive tasks such as machine learning and scientific simulations to GPU cores for massive parallelism.^[12]^[13] Virtualization becomes crucial in multi-tenant environments like cloud data centers, where GPUs often remain underutilized due to bursty workloads, leading to inefficient resource pooling without sharing mechanisms.^[14] The benefits of GPU virtualization include significant cost savings through resource pooling, allowing multiple tenants to share expensive hardware, and improved scalability for diverse workloads in cloud infrastructures.^[8] However, it introduces limitations such as mediation overhead from command interception and context switching, which can degrade performance, and potential security risks like side-channel attacks or data leakage in shared memory spaces between VMs.^[8]^[15]

Historical Development

The concept of GPU virtualization emerged in the late 2000s, driven by the need to enable graphics acceleration within virtual machines for improved performance in hosted environments. In 2008, researchers at VMware introduced a foundational approach through their paper on hosted I/O architecture, which proposed strategies for sharing GPU resources among multiple virtual machines, focusing initially on graphics workloads to overcome the limitations of software-only rendering. This work laid the groundwork for taxonomy in GPU virtualization techniques, emphasizing the challenges of direct hardware access in virtualized settings.^[16] The 2010s marked significant commercialization and technical advancements, spurred by virtual desktop infrastructure (VDI) demands. NVIDIA launched GRID in 2012, the industry's first cloud-based GPU solution, enabling workstation-class graphics delivery to remote users across various devices and paving the way for scalable VDI deployments. Concurrently, the Linux kernel introduced mediated device support in 2016, facilitating secure GPU sharing via frameworks like VFIO-mdev, which allowed hypervisors to partition GPU resources without full passthrough.^[17]^[18] AMD followed in 2016 with SR-IOV support in its Radeon Instinct accelerators, introducing MxGPU technology that conformed to the Single Root I/O Virtualization standard for multi-user GPU partitioning. A pivotal academic contribution came from the USENIX ATC 2014 paper on gVirt, which detailed a full GPU virtualization solution using mediated pass-through, enabling native drivers in guest VMs while supporting both graphics and compute workloads. The rise of general-purpose GPU (GPGPU) computing, ignited by NVIDIA's CUDA platform in 2006, further accelerated virtualization efforts, with demands intensifying around the 2017 AI boom as deep learning applications required efficient GPU resource allocation in shared environments.^[4]^[19] Entering the 2020s, GPU virtualization evolved to address AI and machine learning scalability, shifting focus from primarily VDI to high-performance computing in cloud and containerized setups. NVIDIA introduced Multi-Instance GPU (MIG) with the A100 Tensor Core GPU in 2020, allowing a single GPU to be partitioned into up to seven isolated instances for guaranteed resource allocation and enhanced utilization in multi-tenant environments. Integration with container orchestration advanced in 2021, as Kubernetes gained robust GPU sharing capabilities through device plugins and operators supporting MIG and time-slicing, enabling efficient workload distribution in cloud-native AI pipelines. By 2025, NVIDIA released vGPU software version 18.0, adding support for Windows Server 2025 and AI-optimized VDI, facilitating seamless Linux workloads via Windows Subsystem for Linux and broadening virtualization for generative AI applications.^[20]^[7]^[21] These developments were propelled by the transition from VDI-centric use cases to AI/ML imperatives, where efficient GPU sharing became critical for cost-effective scaling. The data center GPU market, encompassing virtualization technologies, grew from $18.4 billion in 2024 to a projected $92 billion by 2030, reflecting surging demand for virtualized compute in hyperscale environments.^[22]

Virtualization Techniques

API Remoting

API remoting is a software-based technique for GPU virtualization that enables multiple virtual machines (VMs) to share a physical GPU without requiring specialized hardware support. In this approach, API calls from graphics or compute applications running in a guest VM—such as those to OpenGL for rendering or CUDA for parallel computing—are intercepted by a proxy driver or middleware in the guest. These calls are then serialized into a data stream and forwarded to the host system, either through inter-process communication (IPC) for local virtualization or over a network for remote execution. On the host, the calls are deserialized, executed on the physical GPU using the native driver, and the results are returned to the guest VM in a similar manner. This method abstracts the GPU hardware, allowing transparent access while the host retains full control over the device.^[23]^[24] Prominent implementations of API remoting include rCUDA, VirtGL, and gVirtuS, each targeting specific APIs and use cases. rCUDA, introduced around 2010, focuses on remote CUDA execution, enabling GPU-accelerated applications in HPC clusters to offload computations to distant accelerators via network forwarding, thereby reducing the need for local GPUs in every node.^[25] VirtGL, developed as part of the Mesa 3D graphics library, provides OpenGL acceleration in QEMU-based VMs by translating guest OpenGL calls to host-side rendering through a virtual 3D GPU interface, supporting desktop and lightweight graphics workloads.^[26] gVirtuS, originating from a 2010 framework for cloud-based GPGPU, offers general-purpose API forwarding for CUDA and other libraries, facilitating transparent virtualization across heterogeneous environments like ARM clusters accessing x86-hosted GPUs.^[27] The primary advantages of API remoting lie in its low hardware requirements and flexibility for distributed systems, as it requires no modifications to the GPU itself and supports dynamic resource sharing among VMs. It is particularly well-suited for high-performance computing (HPC) environments where compute locality is less critical than resource efficiency, such as in multi-node setups integrated with Message Passing Interface (MPI) for parallel workloads like AI training or scientific simulations. However, drawbacks include significant latency from serialization, deserialization, and transmission—often requiring sub-20 μs round-trip times to limit overhead to under 5% in inference tasks—which can result in 20-50% performance degradation for bandwidth-intensive or latency-sensitive graphics applications due to data transfer overheads.^[24]^[28] From a security perspective, API remoting enhances isolation by confining guest access to mediated API interactions rather than direct hardware control, thereby reducing risks of GPU side-channel attacks or VM escapes that could arise in pass-through scenarios. This software-mediated approach ensures that sensitive data remains within VM boundaries, with the host enforcing access policies, though careful implementation is needed to prevent implicit resource contention.^[23]^[29]

Device Emulation

Device emulation in GPU virtualization refers to the hypervisor's full software simulation of GPU hardware, presenting a virtual graphics device to the guest operating system that mimics physical GPU behavior without any involvement of actual hardware. The hypervisor intercepts and emulates key GPU components, including registers for configuration and control, memory mappings for framebuffers and textures, and command submission queues where the guest issues rendering instructions. These operations are handled entirely in software by the hypervisor's device models, ensuring isolation and compatibility while processing I/O traps from the guest. This technique is foundational in emulators like QEMU, where it enables basic graphics support in virtualized environments devoid of dedicated GPUs.^[30]^[31] A key example is QEMU's virtio-gpu device model, which implements a paravirtualized GPU interface for both 2D and limited 3D acceleration. The guest OS loads a virtio-compatible driver that communicates with the hypervisor via a standardized ring buffer, submitting graphics commands that QEMU emulates using CPU-based backends. For 3D workloads, virtio-gpu integrates with software renderers like LLVMpipe in the Mesa 3D graphics library, which translates OpenGL calls into multithreaded CPU instructions for rasterization, vertex processing, and shading without hardware acceleration. LLVMpipe leverages LLVM for just-in-time code generation, supporting up to 32 CPU cores for parallel execution, but remains constrained to basic OpenGL features.^[31]^[32]^[33] The advantages of device emulation include its independence from physical GPUs, providing broad compatibility across host hardware and allowing virtualization on standard servers or even CPU-only systems. It ensures strong isolation since no real hardware is shared, making it suitable for secure or resource-constrained deployments. However, performance drawbacks are significant: software-based rendering imposes heavy CPU overhead, limiting throughput to basic tasks and rendering complex 3D scenes impractically slow, often with frame rates below 30 FPS even on multi-core hosts for simple low-resolution workloads. This makes it viable only for lightweight graphics, such as desktop icons, text rendering, and simple UI elements, while failing for demanding applications like gaming or GPGPU compute due to the absence of parallel hardware execution. Unlike API remoting techniques, which can proxy compute operations to physical GPUs, device emulation cannot support high-performance GPGPU effectively.^[31]^[32]^[34] Technically, paravirtualized drivers in the guest enhance efficiency by reducing trap frequency compared to fully emulated legacy devices like VGA; the driver batches commands into virtqueues for the hypervisor to process, emulating responses for register reads/writes and memory operations. This handles straightforward workloads proficiently—such as 2D compositing in desktop environments—but bottlenecks arise in shader-heavy or texture-intensive scenarios, where CPU simulation of GPU pipelines leads to orders-of-magnitude slowdowns relative to native hardware. GPGPU emulation is particularly unsupported, as the model focuses on graphics APIs rather than parallel compute kernels.^[33]^[31] Evolutionarily, device emulation has benefited from integrations like the SPICE protocol, which enhances remote display by efficiently transporting emulated graphics output from the hypervisor to clients, supporting features such as dynamic resolution adjustment and multi-monitor setups without hardware dependencies. Initially limited to frame-based protocols like VNC, SPICE's adoption in QEMU improved latency and bandwidth for software-rendered content, but the approach persists as a fallback for hosts without GPU resources, supplanted by hardware-accelerated methods in production environments.^[35]^[36]

Fixed Pass-Through

Fixed pass-through, also known as direct device assignment or PCI passthrough, dedicates an entire physical GPU to a single virtual machine (VM) by assigning the hardware directly to the guest, allowing it to operate as if it were native hardware. This technique leverages frameworks like VFIO in Linux to bind the GPU device to the VM, bypassing the hypervisor's intervention in device operations.^[37] The guest operating system interacts with the GPU through standard drivers, perceiving it as a physical device without emulation overhead.^[38] To implement fixed pass-through, an Input-Output Memory Management Unit (IOMMU), such as Intel VT-d or AMD-Vi, must be enabled in the host BIOS to provide address translation, DMA isolation, and interrupt remapping, ensuring the assigned GPU cannot access unauthorized host memory.^[37] The setup involves unbinding the GPU from the host's native driver (e.g., via sysfs in Linux) and rebinding it to a VFIO driver like vfio-pci, which creates an IOMMU-protected container for the device.^[37] In hypervisors such as KVM/QEMU, the GPU is then attached to the VM configuration, typically using commands or XML descriptors to specify the PCI device ID, allowing the guest to load its own vendor-specific drivers upon boot.^[38] This approach delivers near-native performance, often achieving 98-100% of bare-metal GPU efficiency in workloads like CUDA and OpenCL benchmarks across hypervisors including KVM.^[39] It provides full access to GPU features, including compute capabilities and direct memory access, making it suitable for latency-sensitive applications. However, it lacks resource sharing, requiring one GPU per VM and leaving the device idle when the VM is powered off or inactive.^[39] The configuration process is complex, demanding precise hardware compatibility and manual intervention for binding and isolation.^[37] Fixed pass-through is commonly employed in gaming VMs for high-fidelity rendering and single-tenant AI training environments where dedicated hardware maximizes throughput.^[38] It also supports multi-GPU configurations, enabling passthrough of multiple devices to a single VM for scaled workloads. A key limitation is the dependency on one GPU per VM, which can lead to underutilization in multi-tenant setups, and challenges in error recovery; if the VM crashes, the GPU may enter an unresponsive state requiring host-level resets, as direct access prevents the hypervisor from managing device state.^[40] This has prompted evolutions toward mediated techniques for safer sharing.^[38]

Mediated Pass-Through

Mediated pass-through is a GPU virtualization technique that allows multiple virtual machines (VMs) to share a single physical GPU through kernel-level software mediation, providing each VM with a virtual GPU (vGPU) device while maintaining high performance and isolation. This method relies on the Linux mediated device (mdev) framework, which enables the creation of virtual devices backed by the physical GPU hardware. The hypervisor then schedules access to the GPU among the vGPUs using time-slicing mechanisms or lightweight approximations of Single Root I/O Virtualization (SR-IOV), ensuring fair resource allocation without dedicating the entire GPU to one VM.^[41]^[4] In practice, the mdev framework registers virtual device types with the VFIO (Virtual Function I/O) subsystem, allowing user-space tools to instantiate vGPUs as mediated devices. For instance, NVIDIA's vGPU driver integrates with this framework to generate mediated devices supporting configurable profiles, such as dividing a 16 GB GPU into up to 16 slices of 1 GB each, tailored to workload needs like graphics rendering or compute tasks. These vGPUs appear as PCI devices to the VMs, enabling direct driver access while the host kernel mediates command submissions and resource contention.^[42]^[43] The technique offers a balance between multi-tenancy and efficiency, supporting up to 32 VMs per GPU in fine-grained profiles, with performance reaching 80-95% of native execution for GPU-intensive workloads, depending on the sharing ratio and application. However, it introduces overhead from GPU context switching—typically 5-20%—and requires proprietary licensing for commercial implementations like NVIDIA vGPU. Additional technical aspects include memory pinning to prevent page faults during VM execution and error containment to limit the impact of faults to individual vGPUs rather than the host or other VMs.^[4]^[42]^[44] From a security perspective, mediated pass-through enhances isolation by leveraging Input-Output Memory Management Units (IOMMUs) to restrict DMA operations, preventing malicious VMs from accessing unauthorized memory regions on the host or peers. This mediation layer also confines GPU faults, such as invalid commands or resource exhaustion, to the affected VM, reducing the risk of denial-of-service across the system. Unlike fixed pass-through, which assigns the full GPU to a single VM, this approach enables secure sharing through scheduled, mediated access.^[45]

Hardware-Assisted Partitioning

Hardware-assisted partitioning leverages specialized GPU hardware features to divide a single physical GPU into multiple isolated sub-partitions or virtual functions, enabling direct assignment to virtual machines (VMs) with minimal software intervention. This approach primarily utilizes Single Root I/O Virtualization (SR-IOV), a PCIe standard that allows a physical function (PF) on the GPU to create multiple lightweight virtual functions (VFs), each appearing as an independent PCIe device assignable to separate VMs for direct I/O access. Complementing SR-IOV, proprietary technologies like NVIDIA's Multi-Instance GPU (MIG) further partition the GPU into isolated instances, allocating dedicated slices of compute cores, memory, and cache enforced at the hardware level to ensure resource exclusivity and security. Under SR-IOV, the PCIe specification supports up to 256 VFs per PF, though GPU implementations typically limit this to 8–64 VFs depending on the device to balance resource granularity and overhead. Each VF provides near-direct access to GPU resources without hypervisor mediation, bypassing traditional software virtualization layers for reduced latency. In MIG, partitioning divides the GPU's streaming multiprocessors (SMs), high-bandwidth memory (HBM), and L2 cache into configurable slices—such as 1/7th or 1/3rd of total resources— with hardware mechanisms like memory protection units and fault isolation domains preventing cross-instance interference or data leakage. Prominent examples include NVIDIA's A100 and H100 GPUs, which introduced MIG in 2020 and support up to seven isolated instances per GPU, each with independent compute (e.g., 10–40 SMs) and memory (e.g., 5–40 GB HBM) allocations tailored for data center workloads. AMD's Instinct MI-series accelerators, such as the MI25 and later models, employ SR-IOV via their MxGPU technology to generate up to 16 VFs, enabling fine-grained sharing of compute and memory resources across VMs. Intel's Graphics Virtualization Technology with direct device assignment (GVT-d), extended to select discrete GPUs like the Arc Pro series (e.g., B50 and B60, introduced in 2025) through SR-IOV enablement, allows partitioning into multiple virtual GPUs for isolated graphics acceleration.^[46] This method delivers near-native performance, often exceeding 95% of bare-metal throughput per partition due to hardware-level resource dedication and minimal overhead, while providing strong isolation comparable to physical device passthrough. However, adoption is constrained by hardware availability—only specific high-end GPUs support these features—and partition sizes are fixed at configuration time, limiting dynamic resizing without rebooting the system. In 2025, hardware-assisted partitioning has evolved for AI applications through integration with confidential computing, where features like NVIDIA's Hopper and Blackwell GPU enclaves enable secure, attested execution of sensitive models in isolated partitions, protecting against host or multi-tenant threats during inference and training. AMD is also advancing GPU confidential computing capabilities on Instinct accelerators.^[47]^[48]

Vendor-Specific Implementations

NVIDIA

NVIDIA's GPU virtualization ecosystem centers on its proprietary vGPU software, formerly known as GRID, which entered private beta in 2013 and enables multiple virtual machines to share a single physical GPU through time-slicing or hardware partitioning techniques.^[49] This platform supports a range of data center GPUs, including Tesla, RTX, and A-series models, providing direct access to NVIDIA's graphics and compute capabilities in virtualized environments for applications like virtual desktops, professional visualization, and AI workloads.^[1] By leveraging mediated pass-through, vGPU allows efficient resource allocation while maintaining isolation between VMs.^[50] Key features of NVIDIA vGPU include flexible profiles for VM sizing, such as the A40-8Q profile that assigns 8 GB of frame buffer to support medium-intensity graphics tasks.^[51] The vGPU 18.0 release in 2025 extends compatibility to Windows Server 2025 as a guest OS, introduces AI-optimized VDI for generative AI applications, and incorporates confidential vGPU capabilities to enhance data privacy in multi-tenant setups.^[7] These advancements prioritize secure, high-performance virtualization tailored for enterprise AI and remote work scenarios. Hardware integration in NVIDIA vGPU utilizes SR-IOV on A100 and subsequent GPUs to enable virtual functions with full IOMMU protection, reducing overhead and improving VM isolation.^[52] Complementing this, Multi-Instance GPU (MIG) partitioning divides a GPU like the A100 into up to seven independent instances, each with dedicated compute, memory, and bandwidth for assignment to VMs.^[53] Licensing options are segmented by use case: vApps and vPC for virtual desktop infrastructure (VDI), vWS for professional visualization, and vCS for compute workloads supporting CUDA acceleration.^[54] Performance scaling allows up to 32 vGPUs per physical GPU on models like the A40, optimizing density for large-scale deployments while preserving CUDA and multi-instance support for machine learning tasks.^[55] The broader ecosystem integrates with NVIDIA GPU Cloud (NGC) for deploying pre-built AI containers on vGPU instances and with Kubernetes via the NVIDIA GPU Operator, facilitating orchestrated GPU sharing in containerized environments.^[56]^[57]

AMD

AMD's GPU virtualization technology, known as MxGPU, was introduced in 2016 as the industry's first hardware-virtualized GPU solution, leveraging the Single Root I/O Virtualization (SR-IOV) standard to enable secure and efficient sharing of GPU resources among multiple virtual machines (VMs).^[58] MxGPU partitions the physical GPU into virtual functions (VFs), each appearing as an independent device on the PCIe bus, allowing up to 16 vGPUs per physical GPU on supported models such as the Radeon Instinct MI25 and MI50 accelerators.^[59] This spatial partitioning approach dedicates fixed slices of GPU resources—like compute units, memory, and engines—to each VF, eliminating the need for time-slicing and providing predictable quality of service without software-mediated overhead.^[60] Key features of MxGPU include direct hardware access for VMs, which supports both graphics and compute workloads through integration with APIs such as OpenGL and Vulkan, as well as AMD's ROCm open-source platform for high-performance computing (HPC) and AI applications.^[59] Unlike time-sharing methods, MxGPU relies on SR-IOV for isolation and resource allocation, ensuring each vGPU receives dedicated hardware slices for enhanced security and minimal contention.^[61] The technology supports fine-grained partitioning modes, such as single-precision (SPX) with 1 VF or compute-precision (CPX) with 8 VFs, optimized for specific workloads like AI training on Instinct GPUs.^[62] Hardware support for MxGPU spans the MI-series Instinct accelerators, with SR-IOV enabling up to 16 VFs on models like MI25 and MI50, and broader capabilities on newer architectures.^[59] As of 2025, updates for AI-focused workloads incorporate the CDNA architecture in Instinct MI350X and MI355X GPUs, which maintain MxGPU compatibility while delivering enhanced tensor core performance for machine learning tasks.^[59] These GPUs, paired with AMD EPYC processors in server environments, facilitate scalable virtualization for data centers. MxGPU achieves near-native performance for both graphics rendering and compute operations, with VMs accessing GPU resources directly via VFs to minimize latency and maximize throughput in virtualized setups.^[61] The ecosystem relies on the open-source amdgpu driver stack, including a physical function (PF) driver for the host and virtual function (VF) drivers for guests, alongside ROCm for compute acceleration and AMD SMI tools for management.^[63] This open-source emphasis promotes broad compatibility across hypervisors like KVM/QEMU, distinguishing MxGPU in enterprise deployments.

Intel and Others

Intel's approach to GPU virtualization emphasizes integrated graphics processing units (iGPUs) and low-power discrete solutions, prioritizing efficient sharing for virtual desktop infrastructure (VDI) and media workloads. Introduced in 2013 with the 5th generation Intel Core processors (Broadwell), Intel Graphics Virtualization Technology for graphics (GVT-g) provides mediated pass-through capabilities, enabling the creation of up to seven virtual GPUs (vGPUs) from a single iGPU. This technology emulates full or partial GPU instances, allowing multiple virtual machines to access graphics acceleration while maintaining isolation through the VFIO mediated device framework. GVT-g supports platforms up to 10th generation Intel Core processors and is particularly effective for lightweight graphics tasks, including 3D rendering and display output.^[64]^[65]^[66] A key feature of GVT-g is its integration with Intel Quick Sync Video, which enables hardware-accelerated video encoding and decoding within virtualized environments, supporting codecs like H.264 and HEVC for applications such as video conferencing and streaming. For newer integrated GPUs, such as the Iris Xe in 11th generation Intel Core processors (Tiger Lake) and beyond, Intel shifted to Single Root I/O Virtualization (SR-IOV), which partitions the iGPU into up to seven virtual functions for direct assignment to VMs, reducing overhead compared to emulation-based approaches. This SR-IOV support extends to discrete GPUs in the Intel Arc Pro series, including the Battlemage (B-series) lineup, where it facilitates time-sliced or partitioned access for multi-tenant scenarios. Additionally, the Intel Data Center GPU Flex Series, introduced in 2022, builds on SR-IOV with enhanced partitioning for VDI and visual AI, allowing flexible resource allocation across up to 32 Xe cores and 4 media engines per GPU, depending on the model (e.g., Flex 170).^[67]^[68]^[69]^[46]^[70] Performance benchmarks for these Intel solutions show strong results for light VDI use cases, with virtualized workloads achieving over 85% of native iGPU performance for 3D tasks like office applications and basic 3D modeling, though efficiency drops for compute-heavy GPGPU operations due to emulation or partitioning overhead.^[71] Intel's Gaudi3 AI accelerators, entering broader availability in 2025, incorporate SR-IOV-like virtualization through PCI passthrough in KVM environments, enabling scalable AI training and inference in virtualized data centers while supporting open-source frameworks like PyTorch.^[72]^[73] Other vendors contribute niche solutions tailored to specific ecosystems. ARM's Mali GPUs, common in mobile and embedded systems, support virtualization via paravirtualization extensions in hypervisors like KVM, where a modified kernel driver and arbiter remap registers and route interrupts to enable secure GPU sharing across VMs without full hardware passthrough. In cloud platforms, AWS leverages the Nitro hypervisor for underlying isolation, combined with software techniques like time-slicing in Amazon EKS, to share GPUs across EC2 instances for inference workloads. Google Cloud's Tensor Processing Units (TPUs) integrate virtualization layers through TPU VMs, allowing direct SSH access to dedicated or multi-sliced accelerators for AI tasks, optimizing for high-throughput matrix operations in virtualized setups.^[74]^[75]^[76]^[77]

Hypervisor and Platform Support

KVM/QEMU

KVM (Kernel-based Virtual Machine) serves as the kernel module providing hardware-assisted virtualization acceleration, while QEMU acts as the user-space emulator and orchestrator for device handling in virtual machines, enabling GPU virtualization through paravirtualized interfaces, direct passthrough, and mediated devices.^[78] This combination supports VFIO (Virtual Function I/O) for fixed PCI passthrough, allowing a physical GPU to be directly assigned to a guest VM with near-native performance, and mediated devices (mdev) for sharing a single GPU across multiple VMs via vendor-specific virtualization frameworks.^[78]^[79] Setup for GPU virtualization in KVM/QEMU typically begins with enabling IOMMU support in the host kernel (e.g., via intel_iommu=on for Intel or amd_iommu=on for AMD in GRUB configuration) to facilitate secure device isolation. For emulated graphics, virtio-gpu is configured as the virtual display device using QEMU's -device virtio-gpu option, paired with the virglrenderer backend on the host for 3D acceleration in guests supporting OpenGL.^[80] PCI passthrough for fixed assignment is managed through libvirt by editing the VM's XML configuration to include a <hostdev> element with type='pci' and the VFIO driver, requiring the GPU to be unbound from host drivers beforehand using tools like vfio-pci.^[78] For mediated vGPUs, vendor drivers (e.g., NVIDIA GRID or AMD MxGPU) are installed on the host to create mdev instances via mdevctl, which are then attached to VMs as PCI-like devices in libvirt XML with type='mdev'.^[79]^[78] Key features include seamless integration with libvirt for declarative VM configuration, allowing GPU devices to be specified in XML without direct QEMU command-line intervention, which simplifies management in tools like virt-manager. Multi-queue support in virtio devices enhances performance by distributing workloads across multiple vCPUs, reducing bottlenecks in I/O-intensive scenarios such as graphics rendering. As of October 2025, libvirt 11.8.0 and later versions provide support for NVIDIA Multi-Instance GPU (MIG) configurations through mediated devices, enabling fine-grained GPU resource allocation to VMs on compatible hardware like A100 or H100 GPUs.^[81] Limitations include the need for manual configuration of SR-IOV virtual functions, which requires explicit host-side setup for creating and binding VFs before passthrough, and optimal performance is primarily achieved with Linux guests due to better driver support for virtio and VFIO interfaces. Proxmox VE, a popular open-source virtualization platform, leverages KVM/QEMU for GPU sharing by supporting both PCI passthrough and mediated devices in its web-based interface, facilitating multi-VM GPU utilization in datacenter environments.^[82]^[78]

VMware

VMware vSphere 7 and later versions integrate GPU virtualization through support for NVIDIA Virtual GPU (vGPU) software, which utilizes mediated devices to enable time-sliced sharing of NVIDIA GPUs among multiple virtual machines (VMs), and AMD MxGPU technology, which leverages SR-IOV for hardware-based partitioning of AMD GPUs.^[83]^[60] This allows enterprise environments to deploy GPU-accelerated workloads efficiently on virtualized infrastructure. VMware Horizon, a virtual desktop infrastructure (VDI) solution, builds on vSphere to deliver remote access to these GPU-enabled VMs, optimizing for graphics-intensive applications such as design and simulation.^[84] To set up GPU virtualization in vSphere, administrators install the NVIDIA vGPU Manager on the ESXi host via the vSphere Client or command line, followed by configuring VM profiles that define the allocated GPU resources, such as frame buffer size and compute capabilities.^[85] For AMD MxGPU, the process involves enabling SR-IOV in the server BIOS, running the MxGPU Setup Script to create virtual functions (VFs), and assigning these PCI passthrough devices to VMs through vSphere.^[60] Fixed pass-through using SR-IOV is also supported for dedicated GPU allocation to individual VMs in both NVIDIA and AMD configurations.^[86] Key features include dynamic resource allocation via configurable vGPU profiles, which allow flexible partitioning of GPU memory and cores to match workload demands, and vMotion compatibility for live migration of GPU-enabled VMs between hosts without downtime, provided both hosts share compatible GPU configurations.^[87] In 2025, NVIDIA vGPU 18.0 introduced AI extensions that enhance VDI support for machine learning tasks, including compatibility with Windows Subsystem for Linux on Windows Server 2025 and improved inference acceleration in virtualized environments.^[7] Performance is optimized for graphics and compute workloads, with NVIDIA vGPU enabling low-latency rendering in VDI scenarios and AMD MxGPU providing near-native throughput for professional visualization applications.^[88]^[60] Configurations can support up to 16 vGPUs per physical GPU, depending on the hardware and profile selected, balancing density and performance for enterprise-scale deployments.^[89]^[60] Security for shared GPU environments is bolstered by vSphere Virtual Machine Encryption, which protects VM data at rest and in transit, including configurations with mediated pass-through devices where multiple VMs access the same physical GPU. Additionally, both NVIDIA vGPU and AMD MxGPU enforce isolation between virtual instances to prevent cross-VM interference, ensuring compliance in multi-tenant setups.^[42]^[60]

Microsoft Hyper-V

Microsoft Hyper-V provides GPU virtualization through two primary mechanisms: GPU Partitioning (GPU-P), introduced in Windows Server 2022, which enables sharing a single physical GPU among multiple virtual machines (VMs) by allocating dedicated fractions of its resources, and Discrete Device Assignment (DDA), which allows full passthrough of an entire GPU to a single VM for direct hardware access without hypervisor mediation.^[90]^[91] GPU-P leverages hardware-assisted partitioning, similar to SR-IOV techniques, to create isolated virtual functions from the physical GPU, ensuring each VM receives a consistent slice of compute, memory, and encode/decode capabilities while maintaining security isolation.^[90] To set up GPU-P, administrators use PowerShell cmdlets on the Hyper-V host to enumerate supported GPUs, create partitions (e.g., dividing a GPU into four equal 25% slices), and assign them to VMs; this process requires compatible GPU drivers from vendors like NVIDIA or AMD installed on the host.^[92]^[93] DDA setup involves dismounting the GPU from the host using PowerShell commands like Dismount-VMHostAssignableDevice, then assigning it to a VM via Add-VMAssignableDevice, followed by VM reconfiguration to recognize the device.^[91] Both methods support NVIDIA and AMD GPUs, with NVIDIA's drivers enabling advanced features like vGPU profiles in partitioned modes.^[94] Key features of GPU-P include support for up to a vendor-defined maximum of partitions per GPU—often 4 or more depending on the hardware OEM configuration—and compatibility with SR-IOV for efficient resource virtualization, allowing VMs to access GPU resources as native PCIe devices.^[92]^[90] Windows Server 2025 enhances this with live migration support for GPU-partitioned VMs and integration with NVIDIA vGPU software version 18.0, which provides optimized profiles for partitioned deployments on compatible hardware like the NVIDIA L4 or A40.^[90]^[94] GPU-P is suitable for inference and training tasks in cloud environments, though it may introduce minor overhead compared to full passthrough. However, for graphics-intensive applications like VDI, Hyper-V's partitioning is less flexible than specialized tools, as it prioritizes compute sharing over advanced rendering optimizations and lacks native multi-session desktop support without additional configuration.^[90]

Applications and Use Cases

Virtual Desktop Infrastructure (VDI)

GPU virtualization plays a pivotal role in Virtual Desktop Infrastructure (VDI) by enabling the delivery of remote desktops with hardware-accelerated graphics, allowing end-users to access high-performance computing resources without dedicated physical hardware. This technology partitions GPU resources across multiple virtual machines (VMs), facilitating smooth 2D and 3D rendering for demanding applications while maintaining isolation and security. In VDI environments, it supports centralized data centers where users connect via remote protocols to virtualized desktops, optimizing resource utilization for organizations deploying large-scale remote work solutions.^[95] In practical applications, GPU virtualization enables efficient 3D and 2D rendering within VMs, particularly for professional tools such as computer-aided design (CAD) software and media editing suites, by offloading graphics workloads from the CPU to shared GPU instances. This sharing mechanism allows a single physical GPU to support multiple concurrent user sessions, reducing the need for one-to-one hardware mapping and enabling scalable deployment across diverse workloads like architectural modeling or video production. For instance, NVIDIA's Quadro Virtual Data Center Workstation (vDWS) leverages this capability to deliver workstation-class performance in virtualized environments.^[95]^[84] Prominent examples include NVIDIA vGPU software, which supports up to 48 virtual desktops per GPU for knowledge worker profiles using 1 GB frame buffer allocations on high-memory cards like the NVIDIA L40, enabling efficient sharing for graphics-intensive VDI sessions. Similarly, VMware Horizon integrates with NVIDIA vGPU to provide secure, remote access to accelerated virtual desktops, allowing users to run immersive applications with centralized management and features like vMotion for live VM migration without downtime. These implementations ensure compliance with enterprise security standards through isolated GPU partitions.^[96]^[84] The benefits of GPU virtualization in VDI include substantial reductions in hardware requirements by increasing user density—up to 30% more users compared to CPU-only setups—and centralizing IT management for easier updates and scalability. In 2025, advancements in NVIDIA RTX technology enable 4K resolution support with real-time ray-tracing in virtual workstations, delivering low-latency rendering for design professionals and maintaining native-PC-like experiences rated at 99% user satisfaction. These improvements lower total cost of ownership (TCO) through optimized resource sharing and energy efficiency.^[84]^[97] Despite these advantages, challenges persist, including latency introduced by remote display protocols such as VMware Blast Extreme or PCoIP, which can affect responsiveness in high-bandwidth scenarios despite GPU acceleration reducing overall delay by up to 51 ms compared to non-accelerated alternatives. Additionally, per-user licensing models for vGPU software add operational costs, requiring careful planning for large deployments. Effective resource allocation is crucial to mitigate contention among sessions.^[98]^[95] Quantitative metrics highlight the impact, with GPU virtualization enabling up to 48 users per GPU in light VDI workloads, compared to single-user physical desktops, and contributing to cost savings through higher density—organizations report reduced hardware expenses and operational efficiencies that can approach 50% savings versus traditional physical setups by minimizing per-user infrastructure needs.^[96]^[99]

Cloud Computing and AI/ML Workloads

GPU virtualization plays a pivotal role in cloud computing by enabling the efficient sharing of high-performance GPUs across multiple tenants, particularly for demanding AI and machine learning (ML) workloads. This technology allows virtual machines (VMs) or containers to access fractional GPU resources, supporting parallel processing for tasks like distributed model training using frameworks such as TensorFlow and PyTorch, as well as real-time inference in scalable deployments.^[100] By partitioning GPUs via mechanisms like NVIDIA Multi-Instance GPU (MIG), providers can allocate isolated slices to workloads, ensuring low-latency execution while maintaining hardware isolation for security.^[101] This approach facilitates elastic scaling, where GPU resources dynamically adjust to fluctuating demands in multi-tenant environments, reducing idle time and enabling cost-effective on-demand compute.^[102] Prominent cloud platforms integrate GPU virtualization to accelerate AI/ML applications. For instance, Amazon Web Services (AWS) leverages NVIDIA vGPU software on EC2 instances like the G4dn series, which support GPU sharing for ML training and inference through container orchestration, allowing multiple pods to utilize a single GPU efficiently.^[103] Similarly, Microsoft Azure's NC-series virtual machines, powered by NVIDIA GPUs, enable virtualized access for AI workloads, combining compute acceleration with hypervisor support for shared environments. In Kubernetes-based clusters, the NVIDIA device plugin exposes MIG partitions as schedulable resources, optimizing GPU allocation for containerized AI pipelines and supporting fine-grained sharing that boosts throughput for large language models (LLMs).^[104] Tools like the Volcano scheduler further enhance orchestration by managing dynamic MIG reconfiguration and priority queuing, addressing the growing needs of LLM training in 2025 and beyond.^[105] The benefits of GPU virtualization in these contexts are substantial, particularly in utilization efficiency. Traditional bare-metal GPU deployments often achieve only 20-30% utilization due to static allocation and workload mismatches, whereas virtualization with orchestration tools like NVIDIA Run:ai can elevate this to 70-90% by enabling concurrent execution of diverse tasks and fractional resource assignment.^[106] This improvement translates to up to 4x higher workload density, lowering costs for cloud providers and users while supporting hybrid cloud deployments for AI innovation.^[102] Despite these advantages, challenges persist in deploying GPU virtualization for AI/ML. Effective workload scheduling remains complex, as mismatched resource requests can lead to fragmentation and underutilization, requiring advanced plugins and policies for optimal placement.^[107] Data transfer overheads between host and guest environments can introduce latency, especially in distributed training scenarios, potentially impacting performance by 10-20% without optimized networking.^[108] Additionally, confidential computing features, such as those in NVIDIA H100 GPUs, are increasingly vital for secure AI processing in clouds, protecting sensitive data during inference and training through hardware enclaves, though they add minimal compute overhead but require careful integration to mitigate transfer bottlenecks.^[109] Market dynamics underscore the transformative impact of GPU virtualization on cloud AI. The global data center GPU market, fueled by virtualization-enabled scalability for hybrid cloud AI applications, is projected to reach $228 billion by 2030, growing at a 13.7% CAGR from 2025 onward.^[110]

References

[1]
NVIDIA Virtual GPU (vGPU) Software
NVIDIA virtual GPU (vGPU) software is a graphics virtualization platform that extends the power of NVIDIA GPU technology to virtual desktops and apps, ...
[2]
GPU Virtualization and Scheduling Methods - ACM Digital Library
In this survey article, we present an extensive and in-depth survey of GPU virtualization techniques and their scheduling methods. We review a wide range of ...
[3]
[PDF] GPU Virtualization on VMware's Hosted I/O Architecture
This paper introduces a taxonomy of strategies for GPU virtualization and describes in detail the specific GPU virtualization archi- tecture developed for ...
[4]
[PDF] A Full GPU Virtualization Solution with Mediated Pass-Through
Jun 19, 2014 · This paper introduces gVirt, a product level GPU virtualization implementation with: 1) full GPU virtualization running native graphics driver ...
[5]
What Is a Virtual GPU? - NVIDIA Blog
Jun 11, 2018 · Virtualizing a data center GPU allowed it to be shared across multiple virtual machines. This greatly improved performance for applications and desktops.
[6]
Virtualization Overview - UG1784
Jul 1, 2025 · These instances are formed by combining one or more GPU slices. Each instance operates independently, with its own dedicated connection to the ...
[7]
GPU Virtualization and Scheduling Methods - ACM Digital Library
Jun 29, 2017 · In this survey article, we present an extensive and in-depth survey of GPU virtualization techniques and their scheduling methods.
[8]
OpenGL - The Industry's Foundation for High Performance Graphics
### Summary of OpenGL's Role in Graphics Rendering
[9]
DirectX graphics and gaming - Win32 apps - Microsoft Learn
Sep 22, 2022 · Direct2D, Direct2D is a hardware-accelerated, immediate-mode, 2D graphics API that provides high-performance and high-quality rendering for 2D ...Getting started with DirectX... · Direct3D · Direct2D
[10]
CUDA Toolkit - Free Tools and Training
### CUDA for GPGPU Description
[11]
OpenCL - The Open Standard for Parallel Programming of Heterogeneous Systems
### Summary of OpenCL for GPGPU
[12]
(PDF) Enhancing Cloud Resource Utilization with GPU Virtualization
Mar 14, 2025 · Traditional cloud computing infrastructures often struggle with underutilization or inefficient allocation of GPU resources, leading to ...
[13]
[PDF] Confidentiality Issues on a GPU in a Virtualized Environment
Our objective is to highlight possible information leakage due to GPUs in virtualized and cloud computing environments. We provide insight into the different ...
[14]
GPU Virtualization on VMware's Hosted I/O Architecture - USENIX
Nov 3, 2008 · This paper introduces a taxonomy of strategies for GPU virtualization and describes in detail the specific GPU virtualization architecture developed for VMware ...
[15]
NVIDIA Unveils Industry's First Cloud-Based GPU That Delivers ...
Oct 17, 2012 · NVIDIA VGX K2 enables designers, engineers to work anywhere, on any device while accessing the performance of a workstation.
[16]
AMD introduces Radeon Instinct: Accelerating Machine Intelligence
Dec 12, 2016 · Radeon Instinct accelerators feature passive cooling, AMD MultiGPU (MxGPU) hardware virtualization technology conforming with the SR-IOV (Single ...
[17]
NVIDIA A100 Tensor Core GPU
The A100 80GB debuts the world's fastest memory bandwidth at over 2 terabytes per second (TB/s) to run the largest models and datasets.The Most Powerful End-To-End... · Deep Learning Inference · High-Performance Computing
[18]
NVIDIA Virtual GPU 18.0 Enables VDI for AI on Every Virtualized ...
Mar 19, 2025 · Additionally, NVIDIA vGPU 18.0 supports Windows Subsystem for Linux (WSL) with Windows Server 2025, enabling seamless execution of Linux ...
[19]
Schedule GPUs - Kubernetes
Sep 20, 2024 · Kubernetes includes stable support for managing AMD and NVIDIA GPUs (graphical processing units) across different nodes in your cluster, using device plugins.Missing: 2021 | Show results with:2021
[20]
Data Center GPU Strategic Research Report 2025-2030: Advances ...
Oct 15, 2025 · The global market for Data Center GPU was estimated at US$18.4 Billion in 2024 and is projected to reach US$92.0 Billion by 2030, growing at a ...
[21]
GPU Virtualization: Techniques, Solutions & Best Practices
Jan 3, 2025 · GPU virtualization allows multiple virtual machines (VMs) or applications to share the resources of a single physical GPU.
[22]
None
### Summary of GPU API Remoting from arXiv:2401.13354
[23]
rCUDA: Reducing the number of GPU-based accelerators in high ...
rCUDA [38] virtualizes CUDA devices across clusters, enabling compute nodes to transparently execute kernels on remote GPUs, effectively creating a shared GPU ...Missing: origins | Show results with:origins
[24]
VirGL — The Mesa 3D Graphics Library latest documentation
VirGL is a virtual 3D GPU for use inside QEMU virtual machines, that allows the guest operating system to use the capabilities of the host GPU to accelerate 3D ...
[25]
A GPGPU transparent virtualization component for high performance ...
This paper describes the generic virtualization service GVirtuS (Generic Virtualization Service), a framework for development of split-drivers for cloud ...Missing: original | Show results with:original
[26]
[PDF] GPGPU VIRTUALIZATION TECHNIQUES A COMPARATIVE SURVEY
This study comparatively reviews the recent GPU virtualization techniques including API remoting, para, full and hardware based virtualization, targeted for ...
[27]
[PDF] Efficient Performance-Aware GPU Sharing with Compatibility and ...
Jul 9, 2025 · Meanwhile, the API-remoting method experiences both performance violations and low GPU utilization. 2.3.3 Potential Solution. While the above ...
[28]
Device Emulation — QEMU documentation
QEMU supports the emulation of a large number of devices from peripherals such network cards and USB devices to integrated systems on a chip (SoCs).USB emulation · NVMe Emulation · Disk Images · InvocationMissing: gpu | Show results with:gpu
[29]
A closer look at VirtIO and GPU virtualisation | Blog - Linaro
Jun 21, 2023 · The other approach seen in GPU virtualisation is API forwarding. This works by presenting the guest with an idealised piece of virtual hardware ...Missing: gVirtuS | Show results with:gVirtuS
[30]
LLVMpipe — The Mesa 3D Graphics Library latest documentation
### Summary of LLVMpipe as Software Renderer for GPU Emulation
[31]
VirtIO-GPU OpenCL Driver for Hardware Acceleration - Qualcomm
Oct 15, 2024 · In this post, we will examine VirtIO-GPU, a VirtIO-based graphics adapter, and VCL, an OpenCL driver by Qualcomm Technologies, Inc. for VirtIO-GPU.
[32]
The three(ish) levels of QEMU VM graphics - Łukasz Adamczak
Apr 9, 2020 · However, graphics-heavy DEs are only slighly faster than std , since any OpenGL operations are still handled by the llvmpipe software driver.
[33]
Spice User Manual
Spice is an open remote computing solution, providing client access to remote displays and devices (eg keyboard, mouse, audio).Missing: hypervisor | Show results with:hypervisor
[34]
Controlling virtual machines with VNC and Spice - ADMIN Magazine
One alternative to VNC, however, is the new Spice protocol, which promises superior speed and a number of additional features. The Virtual Graphics Adapter. For ...
[35]
VFIO - “Virtual Function I/O” - The Linux Kernel documentation
The VFIO driver is an IOMMU/device agnostic framework for exposing direct device access to userspace, in a secure, IOMMU protected environment.Missing: GPU | Show results with:GPU
[36]
https://www.admin-magazine.com/Archive/2013/13/Controlling-virtual-machines-with-VNC-and-Spice
[37]
https://docs.kernel.org/driver-api/vfio.html
[38]
None
### Summary of Challenges with Direct Device Passthrough
[39]
[PDF] vgpu on kvm - vfio based mediated device framework
Aug 25, 2016 · WHAT IS VGPU? Physical GPU shared among multiple virtual machines. Great performance and suitable for different workload. Full API compatibility ...
[40]
Virtual GPU Software User Guide - NVIDIA Docs
If you are using NVIDIA vGPU software with CUDA on Linux, avoid conflicting installation methods by installing CUDA from a distribution-independent runfile ...
[41]
VFIO Mediated devices - The Linux Kernel Archives
The VFIO driver framework provides unified APIs for direct device access. It is an IOMMU/device-agnostic framework for exposing direct device access to user ...Missing: vGPU | Show results with:vGPU
[42]
Virtual GPU vs. GPU Passthrough: Key Differences Explained
May 14, 2025 · With vGPU, multiple VMs share the same hardware, which could pose shared resource vulnerabilities if not properly managed. More to read from ...
[43]
[PDF] Hardware-Assisted Mediated Pass-Through with VFIO
➢ PCI endpoints, platform devices, etc. ➢ PCI device sharing through PCIe® Single Root I/O Virtualization (SR-. IOV). • VFIO mediated device. ➢ vGPUs ...
[44]
[PDF] GRID VIRTUAL GPU - NVIDIA Docs
▻ Windows Server 2012 R2. 1.3.2 Linux. 64-bit Linux guest VMs are ... In this release of GRID vGPU, nvidia-smi provides basic reporting of vGPU instances.
[45]
Virtual GPU Software User Guide - NVIDIA Docs
NVIDIA Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous, direct access to a single physical GPU, using the same NVIDIA graphics ...
[46]
Selecting the Correct vGPU Profiles - NVIDIA Docs
Aug 5, 2025 · 8 GB RAM. Four vCPUs (2.4 GHz). A40-8Q vGPU Profile. Medium. 16 GB RAM. Eight vCPUs (2.6 GHz). A40-12Q. Heavy user. 32 GB RAM. 12 vCPUs (3.2 GHz).
[47]
Virtual GPU Software User Guide - NVIDIA Docs
Single Root I/O Virtualization (SR-IOV) virtual functions enable full IOMMU protection for the virtual machines that are configured with vGPUs. Figure 1 shows a ...
[48]
NVIDIA Multi-Instance GPU (MIG)
MIG can partition the GPU into as many as seven instances, each fully isolated with its own high-bandwidth memory, cache, and compute cores.Benefits Overview · How The Technology Works · Built For It And DevopsMissing: 2020 | Show results with:2020
[49]
[PDF] NVIDIA Virtual GPU Software Packaging, Pricing, and Licensing Guide
Aug 1, 2024 · Version. Date. Authors. Description of Change. 01. April 22, 2020. CH, SM. Initial Release. 02. May 12, 2020. CH, SM.
[50]
[PDF] NVIDIA Virtual PC (vPC) Sizing Guide
May 5, 2021 · Note: While the NVIDIA A40 has 48 GB of GPU memory, the maximum vGPUs per A40 GPU is limited to 32 for optimal performance. The NVIDIA A16 ...
[51]
NVIDIA NGC
Software from the NGC catalog can be deployed on GPU-powered instances. The software can be deployed directly on virtual machines (VMs) or on Kubernetes ...
[52]
Using NVIDIA vGPU — NVIDIA GPU Operator - NVIDIA Docs
You have access to a private registry such as NVIDIA NGC Private Registry and can push container images to the registry. Git and Docker or Podman are ...Prerequisites · Build The Driver Container · Configure The Cluster With...
[53]
AMD Reveals World's First Hardware-Virtualized GPU Product Line
Feb 1, 2016 · IT budgets can realize support for up to 16 simultaneous users with a single AMD FirePro S7150 GPU card which features 8 GB of GDDR5 memory, ...
[54]
Getting started with Virtualization - Instinct™ Docs - AMD
AMD's MxGPU technology utilizes SR-IOV to allow a single GPU to appear as separate devices on the PCIe bus, presenting virtual functions (VFs) to the operating ...Missing: mechanism | Show results with:mechanism
[55]
[PDF] AMD MxGPU and VMware Deployment Guide, v2.4
AMD MxGPU technology uses the Single Root I/O Virtualization (SR-IOV) PCIe® virtualization standard to create up to 16 virtual MxGPUs per physical GPU. The ...
[56]
Getting started with Virtualization - Instinct™ Docs
AMD's MxGPU technology utilizes SR-IOV to allow a single GPU to appear as separate devices on the PCIe bus, presenting virtual functions (VFs) to the operating ...Missing: MI | Show results with:MI
[57]
GPU Partitioning — AMD Instinct Virtualization Driver
The partitioning is based on the number of VFs enabled, which can be configured as follows: 1 VF (SPX). 8 VFs (CPX). To achieve the desired static compute ...
[58]
AMD Instinct Virtualization Driver
AMD's MxGPU technology enables GPU virtualization through SR-IOV (Single Root I/O Virtualization), allowing virtual machines to share AMD accelerator resources ...Missing: MI | Show results with:MI
[59]
GVTg_Setup_Guide · intel/gvt-linux Wiki - GitHub
Oct 3, 2024 · Intel GVT-g is a full GPU virtualization solution with mediated pass-through (VFIO mediated device framework based), starting from 5th generation Intel Core(TM ...
[60]
Graphics Virtualization Technologies Support for Each Intel ...
To find which graphics virtualization technology is supported on each Intel® graphics family, refer to the table below.
[61]
GVT-g high-level design - Project ACRN Documentation
Jun 27, 2019 · GVT-g uses the physical GPU to directly execute all the commands submitted from a VM, so it avoids the complexity of emulating the Render Engine ...
[62]
Do 11th Generation Intel® Processors Support GVT-g Technology?
GVT-g technology is not supported. The 11th Generation of Intel® Core Processors works with SR-IOV (Single Root IO Virtualization), that is a new ...
[63]
Intel GPU | Jellyfin
This tutorial guides you on setting up full video hardware acceleration on Intel integrated GPUs and ARC discrete GPUs via QSV and VA-API.
[64]
Intel Gen 12 vGPU (SR-IOV) on Proxmox - GitHub
This guide is designed to help you virtualize the 12th-generation Intel integrated GPU (iGPU) and share it as a virtual GPU (vGPU) with hardware acceleration.
[65]
SR-IOV Will Only Be Supported On Intel Arc Pro Graphics Cards
Aug 13, 2025 · SR-IOV for virtualization with the Intel Xe kernel graphics driver will only be supported on the Arc Pro products and -- unfortunately -- not ...
[66]
Intel® Data Center GPU Flex Series
Deliver unmatched flexibility for virtual desktop infrastructure, media processing, and visual AI with the Intel® Data Center GPU Flex Series.Missing: partitioning | Show results with:partitioning
[67]
Why I stopped using Intel GVT-g on Proxmox with Quick Sync - ktz.blog
Feb 23, 2021 · Performance via GVT-g is anywhere from 58-82% slower than Quick Sync being using natively on the bare metal host.Missing: 50-80% | Show results with:50-80%
[68]
Poor performance with Intel GVT-g - Linux - Level1Techs Forums
Jan 13, 2021 · Unfortunately the performance is anywhere from 20-50% of what the iGPU is capable of. This is using an i5 8500 on Proxmox - here are the ' ...Missing: 50-80% | Show results with:50-80%
[69]
Virtualization — Gaudi Documentation 1.22.1 documentation
This document describes how to allocate Intel Gaudi AI accelerator for KVM Guests on Ubuntu 22.04.5 LTS, Ubuntu 24.04.2 LTS and RHEL 9.4. PCI passthrough is ...
[70]
Intel Gaudi 3 Expands Availability to Drive AI Innovation at Scale
May 19, 2025 · Available through Dell AI Factory, Intel Gaudi 3 AI accelerators deliver high performance, open source flexibility and enterprise-grade infrastructure to speed ...Missing: IOV | Show results with:IOV
[71]
Paravirtualization with Mali GPUs - Arm Mali GPU Virtualization Guide
Paravirtualization with Mali GPUs uses a modified hypervisor, an arbiter, and a kernel driver. The hypervisor reroutes interrupts and remaps registers for the ...
[72]
AWS Nitro System
AWS Nitro System is a lightweight hypervisor that provides improved compute and networking performance for EC2 instances.
[73]
GPU sharing on Amazon EKS with NVIDIA time-slicing and ...
Sep 12, 2023 · Amazon EKS users can enable GPU sharing by integrating the NVIDIA Kubernetes device plugin. This plugin exposes the GPU device resources to the Kubernetes ...Gpu Concurrency Choices · Create An Eks Cluster With... · Time-Slicing Gpus In Eks
[74]
Introducing Cloud TPU VMs | Google Cloud Blog
Jun 1, 2021 · New Cloud TPU VMs let you run TensorFlow, PyTorch, and JAX workloads on TPU host machines, improving performance and usability, and reducing ...
[75]
Chapter 13. Managing GPU devices in virtual machines | 8
To access and control GPUs that are attached to the host system, you must configure the host system to pass direct control of the GPU to the virtual machine ( ...Missing: mechanism | Show results with:mechanism
[76]
Linux with KVM - NVIDIA Docs
SR-IOV and the manual placement of vGPUs on GPUs in equal-size mode are not supported on GPUs based on the NVIDIA Turing™ architecture. GPU, Mixed vGPU ...
[77]
GPU virtualisation with QEMU/KVM - Ubuntu Server documentation
Need native performance: Use PCI passthrough of additional GPUs in the system. You'll need an IOMMU set up, and you'll need to unbind the cards from the host ...Graphics · Preparing The Input-Output... · Preparations For Mediated...
[78]
libvirt releases
qemu: Add support for NUMA affinity of PCI devices. To support NVIDIA Multi-Instance GPU (MIG) configurations, libvirt now handles QEMU's acpi-generic ...
[79]
QEMU version 9.0.0 released
Apr 23, 2024 · We'd like to announce the availability of the QEMU 9.0.0 release. This release contains 2700+ commits from 220 authors.
[80]
Chapter 18. Optimizing virtual machine performance | 9
When using virtio-blk or virtio-scsi storage devices in your virtual machines (VMs), the multi-queue feature provides improved storage performance and ...
[81]
PCI Passthrough - Proxmox VE
Sep 3, 2025 · PCI passthrough allows you to use a physical PCI device (graphics card, network card) inside a VM (KVM virtualization only).
[82]
VMware vSphere - NVIDIA Docs
Features Deprecated in Release 19.0 NVIDIA vGPU software 19 is the last release branch to support the following graphics cards: Tesla M10. Tesla V100 SXM2.Missing: formerly | Show results with:formerly
[83]
VMware Horizon and vSphere | NVIDIA Virtual GPU
NVIDIA vGPU solutions, combined with VMware technology, accelerate and secure virtual machines, desktops, and applications in the data center.
[84]
NVIDIA Virtual GPU (vGPU): VMware vSphere Deployment Guide
Overview · Supported NVIDIA GPUs · Choosing Your Hardware · General Prerequisites · Server BIOS Settings · VMware vSphere Installation and vGPU Configuration.Selecting the Correct vGPU... · Installing VMware ESXi · NVIDIA License Server...
[85]
Virtual GPU Software Supported Products - NVIDIA Docs
Apr 21, 2025 · NVIDIA vGPU software supports only 64-bit guest operating systems. No 32-bit guest operating systems are supported.<|control11|><|separator|>
[86]
Using vSphere vMotion to Migrate vGPU Virtual Machines - TechDocs
You can use vMotion to perform a live migration of NVIDIA vGPU-powered virtual machines without causing data loss. To enable vMotion for vGPU virtual machines, ...
[87]
[PDF] VMware vSphere 7 with NVIDIA AI Enterprise time-sliced vGPU vs ...
GPU virtualization is managed by the drivers installed inside the VM and the hypervisor. It exposes. vGPUs to VMs and shares a physical GPU across multiple VMs.Missing: AMD | Show results with:AMD
[88]
What's New - NVIDIA Docs
Apr 21, 2025 · New Features in Release 18.0. Support for vGPUs with different amounts of frame buffer on the same physical GPU ("mixed-size mode") on legacy ...
[89]
Partition and share GPUs with virtual machines on Hyper-V
Jul 24, 2025 · Each VM can access only the GPU resources dedicated to them and the secure hardware partitioning prevents unauthorized access by other VMs.Missing: cons | Show results with:cons
[90]
Deploy graphics devices by using Discrete Device Assignment
Feb 19, 2025 · Learn how to use Discrete Device Assignment (DDA) to pass an entire PCIe device into a virtual machine (VM) with PowerShell.Prerequisites · Configure the VM for DDA
[91]
Partition and assign GPUs to a virtual machine in Hyper-V
Jan 9, 2025 · This article describes how to configure graphics processing unit (GPU) partitions and assign a partition to a virtual machine (VM).Prerequisites · Verify Gpu Driver... · Assign Gpu Partition To A Vm
[92]
Microsoft Windows Server Installation and vGPU Configuration
Jun 2, 2025 · Creating GPU Partitions. To create a GPU partition, start by listing the GPU adapters that support GPU-P by running the following command:.NVIDIA vGPU Configuration · Creating GPU Partitions · Attaching a vGPU to the VM
[93]
NVIDIA Virtual GPU Software v18.0 through 18.5
The NVIDIA virtual GPU software management SDK enables third party applications to monitor and control all NVIDIA physical GPUs and virtual GPUs that are ...Supported Products · Virtual GPU Software User Guide · VMware vSphere
[94]
Manage GPUs using partitioning for Azure Local (preview)
Aug 29, 2025 · This article describes how to manage GPUs using partitioning (GPU-P) for Azure Local virtual machines (VMs) enabled by Azure Arc. GPU-P allows ...
[95]
NVIDIA Virtual GPU Solutions
### NVIDIA vGPU for VDI Summary
[96]
Selecting the Right NVIDIA GPU for Virtualization
May 14, 2025 · This document provides guidance on selecting the optimal combination of NVIDIA GPUs and virtualization software specifically for virtualized ...
[97]
Best Virtual Workstations for 2025: Power, Flexibility & Remote Access
NVIDIA RTX technology enables virtual workstations to handle real-time ray tracing and image processing with low latency, allowing professionals to work ...Big Ideas Need Bigger... · Powered By Nvidia Rtx... · Use Cases And Success...
[98]
VMware Horizon Blast Extreme Acceleration with NVIDIA GRID
Feb 25, 2016 · Using VMware Horizon Blast Extreme with NVIDIA GRID decreased the overall latency by 51ms in comparison to PCoIP and by 27ms compared to VMware ...Latency · Frames Per Second (fps) · Host Cpu Load And Users Per...<|separator|>
[99]
How Virtual GPUs Work, Use Cases & Critical Best Practices
Jan 1, 2025 · Cost efficiency: By enabling resource sharing, vGPUs eliminate the need for dedicated GPUs for each VM. This reduces hardware expenses and ...
[100]
NVIDIA AI Enterprise and NVIDIA vGPU (C-Series)
NVIDIA Virtual GPU (C-Series) accelerates AI and ML workloads by enabling multiple virtual machines to have simultaneous, direct access to a single physical GPU ...
[101]
MIG Support in Kubernetes - NVIDIA Docs
Nov 25, 2024 · This section walks through the steps necessary to deploy and run the k8s-device-plugin and gpu-feature-discovery components for the various MIG ...
[102]
Accelerate AI & Machine Learning Workflows | NVIDIA Run:ai
Learn how AI-native workload orchestration maximizes GPU efficiency, streamlines AI infrastructure management, and scales AI workloads seamlessly across hybrid ...
[103]
Install Kubernetes device plugin for GPUs - Amazon EKS
Oct 3, 2025 · The following procedure describes how to install the NVIDIA Kubernetes device plugin and run a sample test on NVIDIA GPU instances.
[104]
NVIDIA device plugin for Kubernetes - GitHub
The NVIDIA device plugin for Kubernetes is a Daemonset that allows you to automatically: Expose the number of GPUs on each nodes of your cluster ...
[105]
GPU Virtualization - Volcano
Jun 24, 2025 · Volcano addresses this by providing robust virtual GPU (vGPU) scheduling capabilities, facilitating efficient sharing of physical GPUs among ...Background Knowledge of... · Dynamic MIG (Hardware-level... · Installation
[106]
Maximizing GPU Utilization using NVIDIA Run:ai in Amazon EKS
Jun 9, 2025 · Run:ai's fractional GPU technology solves challenges like static allocation, resource competition, and inefficiency in shared GPU clusters.Maximizing Gpu Utilization... · Walkthrough · Understanding Run:Ai...Missing: virtualization | Show results with:virtualization
[107]
Practical Tips for Preventing GPU Fragmentation for Volcano ...
Mar 31, 2025 · Here's a detailed look at the problem, our approach, and the results. Problem: GPU fragmentation and scheduling Inefficiencies. The DGX Cloud ...
[108]
Confidential Computing on Nvidia H100 GPU - Phala Network
Sep 5, 2024 · This report evaluates the performance impact of enabling Trusted Execution Environments (TEE) on NVIDIA H100 GPUs for large language model (LLM) ...Missing: vGPU | Show results with:vGPU
[109]
Confidential Computing on NVIDIA H100 GPUs for Secure and ...
Aug 3, 2023 · The NVIDIA H100 Tensor Core GPU is the first ever GPU to introduce support for confidential computing. It can be used in virtualized environments.
[110]
Data Center GPU Industry worth $228.04 billion by 2030
May 15, 2025 · The global data center GPU market is projected to reach USD 119.97 billion in 2025 and USD 228.04 billion by 2030, registering a CAGR of 13.7% during the ...