Fact-checked by Grok 2 weeks ago

GPU virtualization

GPU virtualization is a computing technique that enables the partitioning and sharing of a single physical (GPU) among multiple virtual machines (VMs) within a virtualized environment, allowing each VM to access a portion of the GPU's computational resources as if it had dedicated hardware. This approach addresses the challenges of GPU underutilization in and settings by providing , , and efficient for workloads. The development of GPU virtualization emerged in the mid-2000s alongside the rise of general-purpose GPU (GPGPU) computing, with early research focusing on enabling GPU acceleration in virtualized systems to support (HPC) applications. Initial efforts, such as those documented in 2008, introduced taxonomies for virtualization strategies, including API remoting—where GPU commands are forwarded over a to a remote physical GPU—and direct device pass-through, which assigns an entire GPU to a single VM for near-native performance but limits sharing. By the early , mediated pass-through techniques like gVirt (introduced in 2014) advanced the field by combining device emulation in software with hardware isolation to support multiple per GPU while running native drivers inside VMs. Major implementations have been driven by leaders, with 's GPU (vGPU) software, first released around 2013 as part of , providing hardware-accelerated sharing for s, workstations, and applications across hypervisors like and KVM. vGPU profiles allocate specific fractions of GPU memory and compute cores to VMs, supporting use cases in training, graphics-intensive design, and virtual desktop infrastructure (VDI). Similarly, AMD's MxGPU and partition-based , integrated into products like the Versal AI Edge Series, divide GPU shaders into isolated slices and partitions, ensuring secure multi-VM access via hardware arbiters and units for and scenarios. These solutions emphasize time-slicing, spatial partitioning, and fine-grained scheduling to balance performance and fairness. Key benefits of GPU virtualization include through , enhanced via VM to prevent , and improved for providers offering GPU-accelerated instances. However, challenges persist, such as overhead from context switching, ensuring equitable resource distribution among VMs, and maintaining low-latency performance for real-time graphics. Ongoing explores hybrid approaches, including -assisted virtualization compliant with standards like SR-IOV, to further optimize for emerging demands in and remote rendering. As of 2025, has released vGPU versions 18 and 19, enhancing support for workloads including LLM fine-tuning in virtualized environments.

Introduction

Definition and Principles

GPU virtualization is the process of abstracting a physical (GPU) to enable multiple virtual machines (VMs) or containers to share its resources for graphics rendering or general-purpose tasks, without providing direct hardware access to any individual instance. This abstraction allows a single physical GPU to be partitioned into multiple virtual GPUs (vGPUs), each appearing as a dedicated device to the guest environment, thereby facilitating efficient in shared setups. The core principles of GPU virtualization revolve around resource isolation, scheduling, and balancing performance trade-offs. Resource isolation ensures that workloads from different VMs do not interfere with each other, maintaining and stability by preventing unauthorized access to shared hardware components like and cores. Scheduling mechanisms, such as time-slicing (where the GPU alternates execution between vGPUs over short intervals) or spatial partitioning (dividing the GPU into concurrent sub-units), manage access to optimize utilization while minimizing latency. These principles involve inherent trade-offs, where sharing efficiency gains come at the cost of overhead compared to native GPU , typically resulting in a 3-10% reduction in throughput depending on the workload and virtualization technique. In the basic architecture, a host-level GPU driver acts as a mediator, intercepting and routing commands from guest to the physical while enforcing policies. environments interact with virtual GPUs through paravirtualized interfaces (requiring guest driver modifications for awareness of the layer) or fully virtualized interfaces (emulating a complete GPU without guest changes), enabling seamless integration with hypervisors like KVM or . GPUs play essential roles in both graphics rendering and general-purpose computing, necessitating virtualization to address resource inefficiencies. For graphics, APIs such as —a cross-platform standard for high-performance 2D and 3D rendering—and —Microsoft's suite for hardware-accelerated 2D/3D graphics in multimedia applications—rely on GPUs to process vertex transformations, , and rasterization for real-time visuals in games and simulations. In general-purpose GPU (GPGPU) computing, frameworks like (NVIDIA's parallel computing platform) and (an open standard for heterogeneous parallel programming) offload compute-intensive tasks such as and scientific simulations to GPU cores for massive parallelism. Virtualization becomes crucial in multi-tenant environments like cloud data centers, where GPUs often remain underutilized due to bursty workloads, leading to inefficient resource pooling without sharing mechanisms. The benefits of GPU virtualization include significant cost savings through resource pooling, allowing multiple tenants to share expensive hardware, and improved scalability for diverse workloads in cloud infrastructures. However, it introduces limitations such as overhead from and context switching, which can degrade , and potential risks like side-channel attacks or data leakage in spaces between .

Historical Development

The concept of GPU virtualization emerged in the late , driven by the need to enable graphics acceleration within virtual machines for improved performance in hosted environments. In 2008, researchers at introduced a foundational approach through their paper on hosted I/O architecture, which proposed strategies for sharing GPU resources among multiple virtual machines, focusing initially on graphics workloads to overcome the limitations of software-only rendering. This work laid the groundwork for in GPU virtualization techniques, emphasizing the challenges of direct hardware access in virtualized settings. The 2010s marked significant commercialization and technical advancements, spurred by virtual desktop infrastructure (VDI) demands. launched in 2012, the industry's first cloud-based GPU solution, enabling workstation-class graphics delivery to remote users across various devices and paving the way for scalable VDI deployments. Concurrently, the introduced mediated device support in 2016, facilitating secure GPU sharing via frameworks like VFIO-mdev, which allowed hypervisors to partition GPU resources without full passthrough. followed in 2016 with SR-IOV support in its Radeon Instinct accelerators, introducing MxGPU technology that conformed to the Single Root standard for multi-user GPU partitioning. A pivotal academic contribution came from the 2014 paper on gVirt, which detailed a full GPU virtualization solution using mediated pass-through, enabling native drivers in guest VMs while supporting both graphics and compute workloads. The rise of general-purpose GPU (GPGPU) computing, ignited by 's platform in 2006, further accelerated virtualization efforts, with demands intensifying around the 2017 AI boom as applications required efficient GPU resource allocation in shared environments. Entering the 2020s, GPU virtualization evolved to address and scalability, shifting focus from primarily VDI to in cloud and containerized setups. NVIDIA introduced Multi-Instance GPU (MIG) with the A100 Tensor Core GPU in 2020, allowing a single GPU to be partitioned into up to seven isolated instances for guaranteed and enhanced utilization in multi-tenant environments. with container orchestration advanced in 2021, as gained robust GPU sharing capabilities through device plugins and operators supporting MIG and time-slicing, enabling efficient workload distribution in cloud-native AI pipelines. By 2025, NVIDIA released vGPU software version 18.0, adding support for Windows Server 2025 and AI-optimized VDI, facilitating seamless Linux workloads via Windows Subsystem for Linux and broadening virtualization for generative AI applications. These developments were propelled by the transition from VDI-centric use cases to imperatives, where efficient GPU sharing became critical for cost-effective scaling. The GPU market, encompassing technologies, grew from $18.4 billion in 2024 to a projected $92 billion by 2030, reflecting surging demand for virtualized compute in hyperscale environments.

Virtualization Techniques

API Remoting

API remoting is a software-based technique for GPU virtualization that enables multiple virtual machines (VMs) to share a physical GPU without requiring specialized hardware support. In this approach, API calls from graphics or compute applications running in a guest VM—such as those to OpenGL for rendering or CUDA for parallel computing—are intercepted by a proxy driver or middleware in the guest. These calls are then serialized into a data stream and forwarded to the host system, either through inter-process communication (IPC) for local virtualization or over a network for remote execution. On the host, the calls are deserialized, executed on the physical GPU using the native driver, and the results are returned to the guest VM in a similar manner. This method abstracts the GPU hardware, allowing transparent access while the host retains full control over the device. Prominent implementations of API remoting include rCUDA, VirtGL, and gVirtuS, each targeting specific and use cases. rCUDA, introduced around 2010, focuses on remote execution, enabling GPU-accelerated applications in HPC clusters to offload computations to distant accelerators via network forwarding, thereby reducing the need for local GPUs in every node. VirtGL, developed as part of the Mesa 3D graphics library, provides acceleration in QEMU-based VMs by translating guest calls to host-side rendering through a virtual 3D GPU interface, supporting desktop and lightweight graphics workloads. gVirtuS, originating from a 2010 framework for cloud-based GPGPU, offers general-purpose API forwarding for and other libraries, facilitating transparent virtualization across heterogeneous environments like clusters accessing x86-hosted GPUs. The primary advantages of API remoting lie in its low hardware requirements and flexibility for distributed systems, as it requires no modifications to the GPU itself and supports dynamic resource sharing among . It is particularly well-suited for (HPC) environments where compute locality is less critical than resource efficiency, such as in multi-node setups integrated with (MPI) for parallel workloads like AI training or scientific simulations. However, drawbacks include significant from , deserialization, and —often requiring sub-20 μs round-trip times to limit overhead to under 5% in tasks—which can result in 20-50% performance degradation for bandwidth-intensive or latency-sensitive graphics applications due to data transfer overheads. From a perspective, remoting enhances by confining guest access to mediated interactions rather than direct control, thereby reducing risks of GPU side-channel attacks or VM escapes that could arise in pass-through scenarios. This software-mediated approach ensures that sensitive data remains within VM boundaries, with the host enforcing access policies, though careful implementation is needed to prevent implicit .

Device Emulation

Device emulation in GPU virtualization refers to the 's full software simulation of GPU hardware, presenting a virtual graphics device to the guest operating system that mimics physical GPU behavior without any involvement of actual hardware. The hypervisor intercepts and emulates key GPU components, including registers for configuration and control, memory mappings for framebuffers and textures, and command submission queues where the guest issues rendering instructions. These operations are handled entirely in software by the hypervisor's device models, ensuring and while processing I/O traps from the guest. This technique is foundational in emulators like , where it enables basic graphics support in virtualized environments devoid of dedicated GPUs. A key example is 's virtio-gpu device model, which implements a paravirtualized GPU for both and limited acceleration. The guest OS loads a virtio-compatible driver that communicates with the via a standardized ring buffer, submitting graphics commands that QEMU emulates using CPU-based backends. For workloads, virtio-gpu integrates with software renderers like LLVMpipe in the Mesa graphics library, which translates calls into multithreaded CPU instructions for rasterization, vertex processing, and shading without . LLVMpipe leverages for just-in-time code generation, supporting up to 32 CPU cores for parallel execution, but remains constrained to basic features. The advantages of device emulation include its independence from physical GPUs, providing broad compatibility across host and allowing virtualization on standard servers or even CPU-only systems. It ensures strong since no real hardware is shared, making it suitable for secure or resource-constrained deployments. However, performance drawbacks are significant: software-based rendering imposes heavy CPU overhead, limiting throughput to basic tasks and rendering complex scenes impractically slow, often with frame rates below 30 even on multi-core hosts for simple low-resolution workloads. This makes it viable only for lightweight , such as desktop icons, text rendering, and simple elements, while failing for demanding applications like or GPGPU compute due to the absence of hardware execution. Unlike API remoting techniques, which can proxy compute operations to physical GPUs, device emulation cannot support high-performance GPGPU effectively. Technically, paravirtualized drivers in the guest enhance efficiency by reducing trap frequency compared to fully emulated legacy devices like VGA; the driver batches commands into virtqueues for the hypervisor to process, emulating responses for register reads/writes and memory operations. This handles straightforward workloads proficiently—such as 2D compositing in desktop environments—but bottlenecks arise in shader-heavy or texture-intensive scenarios, where CPU simulation of GPU pipelines leads to orders-of-magnitude slowdowns relative to native hardware. GPGPU emulation is particularly unsupported, as the model focuses on graphics APIs rather than parallel compute kernels. Evolutionarily, device emulation has benefited from integrations like the SPICE protocol, which enhances remote display by efficiently transporting emulated graphics output from the hypervisor to clients, supporting features such as dynamic resolution adjustment and multi-monitor setups without hardware dependencies. Initially limited to frame-based protocols like VNC, SPICE's adoption in QEMU improved latency and bandwidth for software-rendered content, but the approach persists as a fallback for hosts without GPU resources, supplanted by hardware-accelerated methods in production environments.

Fixed Pass-Through

Fixed pass-through, also known as direct device assignment or passthrough, dedicates an entire physical GPU to a single (VM) by assigning the hardware directly to the guest, allowing it to operate as if it were native hardware. This technique leverages frameworks like VFIO in to bind the GPU device to the VM, bypassing the hypervisor's intervention in device operations. The guest operating system interacts with the GPU through standard drivers, perceiving it as a physical device without overhead. To implement fixed pass-through, an Input-Output Memory Management Unit (IOMMU), such as VT-d or AMD-Vi, must be enabled in the host to provide address translation, DMA isolation, and interrupt remapping, ensuring the assigned GPU cannot access unauthorized host memory. The setup involves unbinding the GPU from the host's native driver (e.g., via in ) and rebinding it to a VFIO driver like vfio-pci, which creates an IOMMU-protected container for the device. In hypervisors such as KVM/, the GPU is then attached to the VM , typically using commands or XML descriptors to specify the device ID, allowing the guest to load its own vendor-specific drivers upon boot. This approach delivers near-native performance, often achieving 98-100% of bare-metal GPU efficiency in workloads like and benchmarks across hypervisors including KVM. It provides full access to GPU features, including compute capabilities and , making it suitable for latency-sensitive applications. However, it lacks resource sharing, requiring one GPU per VM and leaving the device idle when the VM is powered off or inactive. The configuration process is complex, demanding precise compatibility and manual intervention for binding and isolation. Fixed pass-through is commonly employed in gaming VMs for high-fidelity rendering and single-tenant training environments where dedicated hardware maximizes throughput. It also supports multi-GPU configurations, enabling passthrough of multiple devices to a single VM for scaled workloads. A key limitation is the dependency on one GPU per VM, which can lead to underutilization in multi-tenant setups, and challenges in error recovery; if the VM crashes, the GPU may enter an unresponsive state requiring host-level resets, as direct access prevents the from managing device state. This has prompted evolutions toward mediated techniques for safer sharing.

Mediated Pass-Through

Mediated pass-through is a GPU virtualization technique that allows multiple virtual machines () to share a single physical GPU through kernel-level software mediation, providing each VM with a virtual GPU (vGPU) device while maintaining high performance and isolation. This method relies on the Linux mediated device (mdev) framework, which enables the creation of virtual devices backed by the physical GPU hardware. The hypervisor then schedules access to the GPU among the vGPUs using time-slicing mechanisms or lightweight approximations of Single Root I/O Virtualization (SR-IOV), ensuring fair without dedicating the entire GPU to one VM. In practice, the mdev framework registers virtual device types with the VFIO (Virtual Function I/O) subsystem, allowing user-space tools to instantiate vGPUs as mediated devices. For instance, integrates with this framework to generate mediated devices supporting configurable profiles, such as dividing a 16 GB GPU into up to 16 slices of 1 GB each, tailored to workload needs like graphics rendering or compute tasks. These vGPUs appear as devices to the , enabling direct access while the mediates command submissions and resource contention. The technique offers a balance between multi-tenancy and efficiency, supporting up to 32 per GPU in fine-grained profiles, with reaching 80-95% of native execution for GPU-intensive workloads, depending on the ratio and application. However, it introduces overhead from GPU context switching—typically 5-20%—and requires licensing for commercial implementations like vGPU. Additional technical aspects include memory pinning to prevent page faults during VM execution and error containment to limit the impact of faults to individual vGPUs rather than the host or other . From a security perspective, mediated pass-through enhances by leveraging Input-Output Memory Management Units (IOMMUs) to restrict DMA operations, preventing malicious from accessing unauthorized memory regions on the host or peers. This mediation layer also confines GPU faults, such as invalid commands or resource exhaustion, to the affected VM, reducing the risk of denial-of-service across the system. Unlike fixed pass-through, which assigns the full GPU to a single VM, this approach enables secure sharing through scheduled, mediated access.

Hardware-Assisted Partitioning

Hardware-assisted partitioning leverages specialized GPU hardware features to divide a single physical GPU into multiple isolated sub-partitions or virtual functions, enabling direct assignment to virtual machines (VMs) with minimal software intervention. This approach primarily utilizes Single Root I/O Virtualization (SR-IOV), a PCIe standard that allows a physical function (PF) on the GPU to create multiple lightweight virtual functions (VFs), each appearing as an independent PCIe device assignable to separate VMs for direct I/O access. Complementing SR-IOV, proprietary technologies like NVIDIA's further partition the GPU into isolated instances, allocating dedicated slices of compute cores, memory, and cache enforced at the hardware level to ensure resource exclusivity and security. Under SR-IOV, the PCIe specification supports up to 256 VFs per , though GPU implementations typically limit this to 8–64 VFs depending on the device to balance resource granularity and overhead. Each VF provides near-direct access to GPU resources without mediation, bypassing traditional software layers for reduced . In MIG, partitioning divides the GPU's streaming multiprocessors (), high-bandwidth memory (HBM), and cache into configurable slices—such as 1/7th or 1/3rd of total resources— with hardware mechanisms like units and fault isolation domains preventing cross-instance interference or data leakage. Prominent examples include NVIDIA's A100 and GPUs, which introduced MIG in 2020 and support up to seven isolated instances per GPU, each with independent compute (e.g., 10–40 SMs) and memory (e.g., 5–40 GB HBM) allocations tailored for workloads. AMD's MI-series accelerators, such as the MI25 and later models, employ SR-IOV via their MxGPU technology to generate up to 16 VFs, enabling fine-grained sharing of compute and memory resources across . Intel's Graphics Virtualization Technology with direct device assignment (GVT-d), extended to select discrete GPUs like the Arc Pro series (e.g., B50 and B60, introduced in 2025) through SR-IOV enablement, allows partitioning into multiple virtual GPUs for isolated graphics acceleration. This method delivers near-native performance, often exceeding 95% of bare-metal throughput per partition due to hardware-level resource dedication and minimal overhead, while providing strong comparable to physical passthrough. However, adoption is constrained by hardware availability—only specific high-end GPUs support these features—and partition sizes are fixed at configuration time, limiting dynamic resizing without rebooting the system. In , hardware-assisted partitioning has evolved for AI applications through integration with , where features like NVIDIA's Hopper and Blackwell GPU enclaves enable secure, attested execution of sensitive models in isolated partitions, protecting against host or multi-tenant threats during inference and training. is also advancing GPU capabilities on Instinct accelerators.

Vendor-Specific Implementations

NVIDIA

's GPU virtualization ecosystem centers on its proprietary vGPU software, formerly known as , which entered private beta in 2013 and enables multiple virtual machines to share a single physical GPU through time-slicing or hardware partitioning techniques. This platform supports a range of GPUs, including , , and A-series models, providing direct access to 's graphics and compute capabilities in virtualized environments for applications like virtual desktops, professional , and workloads. By leveraging mediated pass-through, vGPU allows efficient while maintaining between VMs. Key features of vGPU include flexible profiles for VM sizing, such as the A40-8Q profile that assigns 8 GB of frame buffer to support medium-intensity graphics tasks. The vGPU 18.0 release in 2025 extends compatibility to 2025 as a guest OS, introduces -optimized VDI for generative applications, and incorporates confidential vGPU capabilities to enhance data privacy in multi-tenant setups. These advancements prioritize secure, high-performance tailored for and scenarios. Hardware integration in vGPU utilizes SR-IOV on A100 and subsequent GPUs to enable virtual functions with full IOMMU protection, reducing overhead and improving isolation. Complementing this, Multi-Instance GPU () partitioning divides a GPU like the A100 into up to seven independent instances, each with dedicated compute, memory, and bandwidth for assignment to . Licensing options are segmented by : vApps and vPC for virtual desktop infrastructure (VDI), vWS for professional visualization, and vCS for compute workloads supporting acceleration. Performance scaling allows up to 32 vGPUs per physical GPU on models like the A40, optimizing density for large-scale deployments while preserving and multi-instance support for tasks. The broader ecosystem integrates with GPU Cloud (NGC) for deploying pre-built containers on vGPU instances and with via the GPU Operator, facilitating orchestrated GPU sharing in containerized environments.

AMD

AMD's GPU virtualization technology, known as MxGPU, was introduced in 2016 as the industry's first hardware-virtualized GPU solution, leveraging the Single Root I/O Virtualization (SR-IOV) standard to enable secure and efficient sharing of GPU resources among multiple virtual machines (VMs). MxGPU partitions the physical GPU into virtual functions (VFs), each appearing as an independent device on the PCIe bus, allowing up to 16 vGPUs per physical GPU on supported models such as the Radeon Instinct MI25 and MI50 accelerators. This spatial partitioning approach dedicates fixed slices of GPU resources—like compute units, memory, and engines—to each VF, eliminating the need for time-slicing and providing predictable quality of service without software-mediated overhead. Key features of MxGPU include direct hardware access for VMs, which supports both graphics and compute workloads through integration with APIs such as OpenGL and Vulkan, as well as AMD's ROCm open-source platform for high-performance computing (HPC) and AI applications. Unlike time-sharing methods, MxGPU relies on SR-IOV for isolation and resource allocation, ensuring each vGPU receives dedicated hardware slices for enhanced security and minimal contention. The technology supports fine-grained partitioning modes, such as single-precision (SPX) with 1 VF or compute-precision (CPX) with 8 VFs, optimized for specific workloads like AI training on Instinct GPUs. Hardware support for MxGPU spans the MI-series accelerators, with SR-IOV enabling up to 16 VFs on models like MI25 and MI50, and broader capabilities on newer s. As of 2025, updates for AI-focused workloads incorporate the CDNA in MI350X and MI355X GPUs, which maintain MxGPU compatibility while delivering enhanced tensor core performance for tasks. These GPUs, paired with AMD EPYC processors in server environments, facilitate scalable for data centers. MxGPU achieves near-native performance for both graphics rendering and compute operations, with VMs accessing GPU resources directly via VFs to minimize and maximize throughput in virtualized setups. The ecosystem relies on the open-source amdgpu driver stack, including a (PF) driver for the host and virtual function (VF) drivers for guests, alongside for compute acceleration and SMI tools for management. This open-source emphasis promotes broad compatibility across hypervisors like KVM/, distinguishing MxGPU in deployments.

Intel and Others

Intel's approach to GPU virtualization emphasizes integrated graphics processing units (iGPUs) and low-power discrete solutions, prioritizing efficient sharing for virtual desktop infrastructure (VDI) and media workloads. Introduced in 2013 with the 5th generation Intel Core processors (Broadwell), Intel Graphics Virtualization Technology for graphics (GVT-g) provides mediated pass-through capabilities, enabling the creation of up to seven virtual GPUs (vGPUs) from a single iGPU. This technology emulates full or partial GPU instances, allowing multiple virtual machines to access graphics acceleration while maintaining isolation through the VFIO mediated device framework. GVT-g supports platforms up to 10th generation Intel Core processors and is particularly effective for lightweight graphics tasks, including 3D rendering and display output. A key feature of GVT-g is its integration with Intel Quick Sync Video, which enables hardware-accelerated video encoding and decoding within virtualized environments, supporting codecs like H.264 and HEVC for applications such as video conferencing and streaming. For newer integrated GPUs, such as the Iris Xe in 11th generation processors () and beyond, Intel shifted to Single Root I/O Virtualization (SR-IOV), which partitions the iGPU into up to seven virtual functions for direct assignment to VMs, reducing overhead compared to emulation-based approaches. This SR-IOV support extends to discrete GPUs in the Pro series, including the Battlemage (B-series) lineup, where it facilitates time-sliced or partitioned access for multi-tenant scenarios. Additionally, the Data Center GPU Flex Series, introduced in 2022, builds on SR-IOV with enhanced partitioning for VDI and visual AI, allowing flexible resource allocation across up to 32 Xe cores and 4 media engines per GPU, depending on the model (e.g., Flex 170). Performance benchmarks for these Intel solutions show strong results for light VDI use cases, with virtualized workloads achieving over 85% of native iGPU performance for 3D tasks like office applications and basic , though efficiency drops for compute-heavy GPGPU operations due to or partitioning overhead. 's Gaudi3 AI accelerators, entering broader availability in 2025, incorporate SR-IOV-like through PCI passthrough in KVM environments, enabling scalable training and inference in virtualized data centers while supporting open-source frameworks like . Other vendors contribute niche solutions tailored to specific ecosystems. ARM's Mali GPUs, common in mobile and embedded systems, support virtualization via paravirtualization extensions in hypervisors like KVM, where a modified kernel driver and arbiter remap registers and route interrupts to enable secure GPU sharing across VMs without full hardware passthrough. In cloud platforms, AWS leverages the Nitro hypervisor for underlying isolation, combined with software techniques like time-slicing in Amazon EKS, to share GPUs across EC2 instances for inference workloads. Google Cloud's Tensor Processing Units (TPUs) integrate virtualization layers through TPU VMs, allowing direct SSH access to dedicated or multi-sliced accelerators for AI tasks, optimizing for high-throughput matrix operations in virtualized setups.

Hypervisor and Platform Support

KVM/QEMU

KVM (Kernel-based Virtual Machine) serves as the kernel module providing hardware-assisted acceleration, while acts as the user-space emulator and orchestrator for device handling in virtual machines, enabling GPU virtualization through paravirtualized interfaces, direct passthrough, and mediated devices. This combination supports VFIO (Virtual Function I/O) for fixed PCI passthrough, allowing a physical GPU to be directly assigned to a guest VM with near-native performance, and mediated devices (mdev) for sharing a single GPU across multiple VMs via vendor-specific virtualization frameworks. Setup for GPU virtualization in KVM/QEMU typically begins with enabling IOMMU support in the host (e.g., via intel_iommu=on for or amd_iommu=on for in GRUB configuration) to facilitate secure device isolation. For emulated graphics, virtio-gpu is configured as the virtual display device using 's -device virtio-gpu option, paired with the virglrenderer backend on the host for acceleration in guests supporting . PCI passthrough for fixed assignment is managed through libvirt by editing the VM's XML configuration to include a <hostdev> element with type='pci' and the VFIO driver, requiring the GPU to be unbound from host drivers beforehand using tools like vfio-pci. For mediated vGPUs, vendor drivers (e.g., or MxGPU) are installed on the host to create mdev instances via mdevctl, which are then attached to VMs as PCI-like devices in libvirt XML with type='mdev'. Key features include seamless integration with libvirt for declarative VM configuration, allowing GPU devices to be specified in XML without direct command-line intervention, which simplifies management in tools like . Multi-queue support in virtio devices enhances performance by distributing workloads across multiple vCPUs, reducing bottlenecks in I/O-intensive scenarios such as graphics rendering. As of October 2025, libvirt 11.8.0 and later versions provide support for NVIDIA Multi-Instance GPU () configurations through mediated devices, enabling fine-grained GPU resource allocation to VMs on compatible hardware like A100 or GPUs. Limitations include the need for manual configuration of SR-IOV virtual functions, which requires explicit host-side setup for creating and binding VFs before passthrough, and optimal performance is primarily achieved with guests due to better driver support for virtio and VFIO interfaces. Proxmox VE, a popular open-source virtualization platform, leverages KVM/ for GPU sharing by supporting both PCI passthrough and mediated devices in its web-based interface, facilitating multi-VM GPU utilization in datacenter environments.

VMware

VMware vSphere 7 and later versions integrate GPU virtualization through support for Virtual GPU (vGPU) software, which utilizes mediated devices to enable time-sliced sharing of GPUs among multiple virtual machines (), and MxGPU technology, which leverages SR-IOV for hardware-based partitioning of GPUs. This allows enterprise environments to deploy GPU-accelerated workloads efficiently on virtualized infrastructure. , a virtual desktop infrastructure (VDI) solution, builds on vSphere to deliver remote access to these GPU-enabled , optimizing for graphics-intensive applications such as design and simulation. To set up GPU virtualization in vSphere, administrators install the vGPU Manager on the ESXi host via the vSphere Client or command line, followed by configuring VM profiles that define the allocated GPU resources, such as frame buffer size and compute capabilities. For MxGPU, the process involves enabling SR-IOV in the server , running the MxGPU Setup Script to create virtual functions (VFs), and assigning these passthrough devices to VMs through vSphere. Fixed pass-through using SR-IOV is also supported for dedicated GPU allocation to individual VMs in both and configurations. Key features include dynamic via configurable vGPU profiles, which allow flexible partitioning of GPU memory and cores to match workload demands, and vMotion compatibility for of GPU-enabled between hosts without , provided both hosts share compatible GPU configurations. In 2025, vGPU 18.0 introduced AI extensions that enhance VDI support for tasks, including compatibility with on Windows Server 2025 and improved inference acceleration in virtualized environments. Performance is optimized for graphics and compute workloads, with vGPU enabling low-latency rendering in VDI scenarios and MxGPU providing near-native throughput for professional visualization applications. Configurations can support up to 16 vGPUs per physical GPU, depending on the and profile selected, balancing density and for enterprise-scale deployments. Security for shared GPU environments is bolstered by vSphere Virtual Machine Encryption, which protects VM data at rest and in transit, including configurations with mediated pass-through devices where multiple VMs access the same physical GPU. Additionally, both vGPU and MxGPU enforce isolation between virtual instances to prevent cross-VM interference, ensuring compliance in multi-tenant setups.

Microsoft Hyper-V

Microsoft provides GPU virtualization through two primary mechanisms: GPU Partitioning (GPU-P), introduced in , which enables sharing a single physical GPU among multiple virtual machines (VMs) by allocating dedicated fractions of its resources, and Discrete Device Assignment (DDA), which allows full passthrough of an entire GPU to a single VM for direct hardware access without mediation. GPU-P leverages hardware-assisted partitioning, similar to SR-IOV techniques, to create isolated virtual functions from the physical GPU, ensuring each VM receives a consistent slice of compute, memory, and encode/decode capabilities while maintaining security isolation. To set up GPU-P, administrators use cmdlets on the host to enumerate supported GPUs, create partitions (e.g., dividing a GPU into four equal 25% slices), and assign them to ; this process requires compatible GPU drivers from vendors like or installed on the host. DDA setup involves dismounting the GPU from the host using commands like Dismount-VMHostAssignableDevice, then assigning it to a VM via Add-VMAssignableDevice, followed by VM reconfiguration to recognize the device. Both methods support and GPUs, with 's drivers enabling advanced features like vGPU profiles in partitioned modes. Key features of GPU-P include support for up to a vendor-defined maximum of partitions per GPU—often 4 or more depending on the OEM —and with SR-IOV for efficient , allowing to access GPU resources as native PCIe devices. Windows Server 2025 enhances this with live support for GPU-partitioned and integration with vGPU software version 18.0, which provides optimized profiles for partitioned deployments on compatible like the L4 or A40. GPU-P is suitable for and tasks in environments, though it may introduce minor overhead compared to full passthrough. However, for graphics-intensive applications like VDI, Hyper-V's partitioning is less flexible than specialized tools, as it prioritizes compute sharing over advanced rendering optimizations and lacks native multi-session desktop support without additional configuration.

Applications and Use Cases

Virtual Desktop Infrastructure (VDI)

GPU virtualization plays a pivotal role in Virtual Desktop Infrastructure (VDI) by enabling the delivery of remote desktops with hardware-accelerated graphics, allowing end-users to access resources without dedicated physical hardware. This technology partitions GPU resources across multiple virtual machines (VMs), facilitating smooth 2D and 3D rendering for demanding applications while maintaining isolation and security. In VDI environments, it supports centralized data centers where users connect via remote protocols to virtualized desktops, optimizing resource utilization for organizations deploying large-scale solutions. In practical applications, GPU virtualization enables efficient 3D and 2D rendering within , particularly for professional tools such as (CAD) software and media suites, by offloading graphics workloads from the CPU to shared GPU instances. This sharing mechanism allows a single physical GPU to support multiple concurrent user sessions, reducing the need for one-to-one hardware mapping and enabling scalable deployment across diverse workloads like architectural modeling or . For instance, NVIDIA's Virtual Data Center Workstation (vDWS) leverages this capability to deliver workstation-class performance in virtualized environments. Prominent examples include vGPU software, which supports up to 48 virtual desktops per GPU for profiles using 1 GB frame buffer allocations on high-memory cards like the L40, enabling efficient sharing for graphics-intensive VDI sessions. Similarly, integrates with vGPU to provide secure, remote access to accelerated virtual desktops, allowing users to run immersive applications with centralized management and features like vMotion for live VM migration without downtime. These implementations ensure compliance with enterprise security standards through isolated GPU partitions. The benefits of GPU virtualization in VDI include substantial reductions in hardware requirements by increasing user density—up to 30% more users compared to CPU-only setups—and centralizing IT management for easier updates and . In 2025, advancements in technology enable 4K resolution support with real-time ray-tracing in virtual workstations, delivering low-latency rendering for design professionals and maintaining native-PC-like experiences rated at 99% user satisfaction. These improvements lower (TCO) through optimized resource sharing and energy efficiency. Despite these advantages, challenges persist, including latency introduced by remote display protocols such as Blast Extreme or PCoIP, which can affect responsiveness in high-bandwidth scenarios despite GPU reducing overall delay by up to 51 compared to non-accelerated alternatives. Additionally, per-user licensing models for vGPU software add operational costs, requiring careful planning for large deployments. Effective resource allocation is crucial to mitigate contention among sessions. Quantitative metrics highlight the impact, with GPU virtualization enabling up to 48 users per GPU in light VDI workloads, compared to single-user physical desktops, and contributing to savings through higher —organizations report reduced expenses and operational efficiencies that can approach 50% savings versus traditional physical setups by minimizing per-user needs.

Cloud Computing and AI/ML Workloads

GPU virtualization plays a pivotal role in cloud computing by enabling the efficient sharing of high-performance GPUs across multiple tenants, particularly for demanding AI and machine learning (ML) workloads. This technology allows virtual machines (VMs) or containers to access fractional GPU resources, supporting parallel processing for tasks like distributed model training using frameworks such as TensorFlow and PyTorch, as well as real-time inference in scalable deployments. By partitioning GPUs via mechanisms like NVIDIA Multi-Instance GPU (MIG), providers can allocate isolated slices to workloads, ensuring low-latency execution while maintaining hardware isolation for security. This approach facilitates elastic scaling, where GPU resources dynamically adjust to fluctuating demands in multi-tenant environments, reducing idle time and enabling cost-effective on-demand compute. Prominent cloud platforms integrate GPU virtualization to accelerate AI/ML applications. For instance, Amazon Web Services (AWS) leverages vGPU software on EC2 instances like the G4dn series, which support GPU sharing for ML training and inference through container orchestration, allowing multiple pods to utilize a single GPU efficiently. Similarly, Microsoft Azure's NC-series virtual machines, powered by GPUs, enable virtualized access for AI workloads, combining compute acceleration with support for shared environments. In Kubernetes-based clusters, the device plugin exposes MIG partitions as schedulable resources, optimizing GPU allocation for containerized AI pipelines and supporting fine-grained sharing that boosts throughput for large language models (). Tools like the scheduler further enhance orchestration by managing dynamic MIG reconfiguration and priority queuing, addressing the growing needs of LLM training in 2025 and beyond. The benefits of GPU virtualization in these contexts are substantial, particularly in utilization efficiency. Traditional bare-metal GPU deployments often achieve only 20-30% utilization due to static allocation and workload mismatches, whereas virtualization with tools like NVIDIA Run:ai can elevate this to 70-90% by enabling concurrent execution of diverse tasks and fractional resource assignment. This improvement translates to up to 4x higher workload density, lowering costs for cloud providers and users while supporting hybrid cloud deployments for innovation. Despite these advantages, challenges persist in deploying GPU virtualization for /. Effective workload scheduling remains complex, as mismatched resource requests can lead to fragmentation and underutilization, requiring advanced plugins and policies for optimal placement. Data overheads between host and guest environments can introduce , especially in distributed scenarios, potentially impacting by 10-20% without optimized networking. Additionally, features, such as those in H100 GPUs, are increasingly vital for secure AI processing in clouds, protecting sensitive data during inference and training through hardware enclaves, though they add minimal compute overhead but require careful integration to mitigate bottlenecks. Market dynamics underscore the transformative impact of GPU virtualization on cloud . The global GPU market, fueled by virtualization-enabled scalability for hybrid cloud applications, is projected to reach $228 billion by 2030, growing at a 13.7% CAGR from 2025 onward.

References

  1. [1]
    NVIDIA Virtual GPU (vGPU) Software
    NVIDIA virtual GPU (vGPU) software is a graphics virtualization platform that extends the power of NVIDIA GPU technology to virtual desktops and apps, ...
  2. [2]
    GPU Virtualization and Scheduling Methods - ACM Digital Library
    In this survey article, we present an extensive and in-depth survey of GPU virtualization techniques and their scheduling methods. We review a wide range of ...
  3. [3]
    [PDF] GPU Virtualization on VMware's Hosted I/O Architecture
    This paper introduces a taxonomy of strategies for GPU virtualization and describes in detail the specific GPU virtualization archi- tecture developed for ...
  4. [4]
    [PDF] A Full GPU Virtualization Solution with Mediated Pass-Through
    Jun 19, 2014 · This paper introduces gVirt, a product level GPU virtualization implementation with: 1) full GPU virtualization running native graphics driver ...
  5. [5]
    What Is a Virtual GPU? - NVIDIA Blog
    Jun 11, 2018 · Virtualizing a data center GPU allowed it to be shared across multiple virtual machines. This greatly improved performance for applications and desktops.
  6. [6]
    Virtualization Overview - UG1784
    Jul 1, 2025 · These instances are formed by combining one or more GPU slices. Each instance operates independently, with its own dedicated connection to the ...
  7. [7]
    GPU Virtualization and Scheduling Methods - ACM Digital Library
    Jun 29, 2017 · In this survey article, we present an extensive and in-depth survey of GPU virtualization techniques and their scheduling methods.
  8. [8]
    OpenGL - The Industry's Foundation for High Performance Graphics
    ### Summary of OpenGL's Role in Graphics Rendering
  9. [9]
    DirectX graphics and gaming - Win32 apps - Microsoft Learn
    Sep 22, 2022 · Direct2D, Direct2D is a hardware-accelerated, immediate-mode, 2D graphics API that provides high-performance and high-quality rendering for 2D ...Getting started with DirectX... · Direct3D · Direct2D
  10. [10]
    CUDA Toolkit - Free Tools and Training
    ### CUDA for GPGPU Description
  11. [11]
  12. [12]
    (PDF) Enhancing Cloud Resource Utilization with GPU Virtualization
    Mar 14, 2025 · Traditional cloud computing infrastructures often struggle with underutilization or inefficient allocation of GPU resources, leading to ...
  13. [13]
    [PDF] Confidentiality Issues on a GPU in a Virtualized Environment
    Our objective is to highlight possible information leakage due to GPUs in virtualized and cloud computing environments. We provide insight into the different ...
  14. [14]
    GPU Virtualization on VMware's Hosted I/O Architecture - USENIX
    Nov 3, 2008 · This paper introduces a taxonomy of strategies for GPU virtualization and describes in detail the specific GPU virtualization architecture developed for VMware ...
  15. [15]
    NVIDIA Unveils Industry's First Cloud-Based GPU That Delivers ...
    Oct 17, 2012 · NVIDIA VGX K2 enables designers, engineers to work anywhere, on any device while accessing the performance of a workstation.
  16. [16]
    AMD introduces Radeon Instinct: Accelerating Machine Intelligence
    Dec 12, 2016 · Radeon Instinct accelerators feature passive cooling, AMD MultiGPU (MxGPU) hardware virtualization technology conforming with the SR-IOV (Single ...
  17. [17]
    NVIDIA A100 Tensor Core GPU
    The A100 80GB debuts the world's fastest memory bandwidth at over 2 terabytes per second (TB/s) to run the largest models and datasets.The Most Powerful End-To-End... · Deep Learning Inference · High-Performance Computing
  18. [18]
    NVIDIA Virtual GPU 18.0 Enables VDI for AI on Every Virtualized ...
    Mar 19, 2025 · Additionally, NVIDIA vGPU 18.0 supports Windows Subsystem for Linux (WSL) with Windows Server 2025, enabling seamless execution of Linux ...
  19. [19]
    Schedule GPUs - Kubernetes
    Sep 20, 2024 · Kubernetes includes stable support for managing AMD and NVIDIA GPUs (graphical processing units) across different nodes in your cluster, using device plugins.Missing: 2021 | Show results with:2021
  20. [20]
    Data Center GPU Strategic Research Report 2025-2030: Advances ...
    Oct 15, 2025 · The global market for Data Center GPU was estimated at US$18.4 Billion in 2024 and is projected to reach US$92.0 Billion by 2030, growing at a ...
  21. [21]
    GPU Virtualization: Techniques, Solutions & Best Practices
    Jan 3, 2025 · GPU virtualization allows multiple virtual machines (VMs) or applications to share the resources of a single physical GPU.
  22. [22]
    None
    ### Summary of GPU API Remoting from arXiv:2401.13354
  23. [23]
    rCUDA: Reducing the number of GPU-based accelerators in high ...
    rCUDA [38] virtualizes CUDA devices across clusters, enabling compute nodes to transparently execute kernels on remote GPUs, effectively creating a shared GPU ...Missing: origins | Show results with:origins
  24. [24]
    VirGL — The Mesa 3D Graphics Library latest documentation
    VirGL is a virtual 3D GPU for use inside QEMU virtual machines, that allows the guest operating system to use the capabilities of the host GPU to accelerate 3D ...
  25. [25]
    A GPGPU transparent virtualization component for high performance ...
    This paper describes the generic virtualization service GVirtuS (Generic Virtualization Service), a framework for development of split-drivers for cloud ...Missing: original | Show results with:original
  26. [26]
    [PDF] GPGPU VIRTUALIZATION TECHNIQUES A COMPARATIVE SURVEY
    This study comparatively reviews the recent GPU virtualization techniques including API remoting, para, full and hardware based virtualization, targeted for ...
  27. [27]
    [PDF] Efficient Performance-Aware GPU Sharing with Compatibility and ...
    Jul 9, 2025 · Meanwhile, the API-remoting method experiences both performance violations and low GPU utilization. 2.3.3 Potential Solution. While the above ...
  28. [28]
    Device Emulation — QEMU documentation
    QEMU supports the emulation of a large number of devices from peripherals such network cards and USB devices to integrated systems on a chip (SoCs).USB emulation · NVMe Emulation · Disk Images · InvocationMissing: gpu | Show results with:gpu
  29. [29]
    A closer look at VirtIO and GPU virtualisation | Blog - Linaro
    Jun 21, 2023 · The other approach seen in GPU virtualisation is API forwarding. This works by presenting the guest with an idealised piece of virtual hardware ...Missing: gVirtuS | Show results with:gVirtuS
  30. [30]
    LLVMpipe — The Mesa 3D Graphics Library latest documentation
    ### Summary of LLVMpipe as Software Renderer for GPU Emulation
  31. [31]
    VirtIO-GPU OpenCL Driver for Hardware Acceleration - Qualcomm
    Oct 15, 2024 · In this post, we will examine VirtIO-GPU, a VirtIO-based graphics adapter, and VCL, an OpenCL driver by Qualcomm Technologies, Inc. for VirtIO-GPU.
  32. [32]
    The three(ish) levels of QEMU VM graphics - Łukasz Adamczak
    Apr 9, 2020 · However, graphics-heavy DEs are only slighly faster than std , since any OpenGL operations are still handled by the llvmpipe software driver.
  33. [33]
    Spice User Manual
    Spice is an open remote computing solution, providing client access to remote displays and devices (eg keyboard, mouse, audio).Missing: hypervisor | Show results with:hypervisor
  34. [34]
    Controlling virtual machines with VNC and Spice - ADMIN Magazine
    One alternative to VNC, however, is the new Spice protocol, which promises superior speed and a number of additional features. The Virtual Graphics Adapter. For ...
  35. [35]
    VFIO - “Virtual Function I/O” - The Linux Kernel documentation
    The VFIO driver is an IOMMU/device agnostic framework for exposing direct device access to userspace, in a secure, IOMMU protected environment.Missing: GPU | Show results with:GPU
  36. [36]
  37. [37]
  38. [38]
    None
    ### Summary of Challenges with Direct Device Passthrough
  39. [39]
    [PDF] vgpu on kvm - vfio based mediated device framework
    Aug 25, 2016 · WHAT IS VGPU? Physical GPU shared among multiple virtual machines. Great performance and suitable for different workload. Full API compatibility ...
  40. [40]
    Virtual GPU Software User Guide - NVIDIA Docs
    If you are using NVIDIA vGPU software with CUDA on Linux, avoid conflicting installation methods by installing CUDA from a distribution-independent runfile ...
  41. [41]
    VFIO Mediated devices - The Linux Kernel Archives
    The VFIO driver framework provides unified APIs for direct device access. It is an IOMMU/device-agnostic framework for exposing direct device access to user ...Missing: vGPU | Show results with:vGPU
  42. [42]
    Virtual GPU vs. GPU Passthrough: Key Differences Explained
    May 14, 2025 · With vGPU, multiple VMs share the same hardware, which could pose shared resource vulnerabilities if not properly managed. More to read from ...
  43. [43]
    [PDF] Hardware-Assisted Mediated Pass-Through with VFIO
    ➢ PCI endpoints, platform devices, etc. ➢ PCI device sharing through PCIe® Single Root I/O Virtualization (SR-. IOV). • VFIO mediated device. ➢ vGPUs ...
  44. [44]
    [PDF] GRID VIRTUAL GPU - NVIDIA Docs
    ▻ Windows Server 2012 R2. 1.3.2 Linux. 64-bit Linux guest VMs are ... In this release of GRID vGPU, nvidia-smi provides basic reporting of vGPU instances.
  45. [45]
    Virtual GPU Software User Guide - NVIDIA Docs
    NVIDIA Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous, direct access to a single physical GPU, using the same NVIDIA graphics ...
  46. [46]
    Selecting the Correct vGPU Profiles - NVIDIA Docs
    Aug 5, 2025 · 8 GB RAM. Four vCPUs (2.4 GHz). A40-8Q vGPU Profile. Medium. 16 GB RAM. Eight vCPUs (2.6 GHz). A40-12Q. Heavy user. 32 GB RAM. 12 vCPUs (3.2 GHz).
  47. [47]
    Virtual GPU Software User Guide - NVIDIA Docs
    Single Root I/O Virtualization (SR-IOV) virtual functions enable full IOMMU protection for the virtual machines that are configured with vGPUs. Figure 1 shows a ...
  48. [48]
    NVIDIA Multi-Instance GPU (MIG)
    MIG can partition the GPU into as many as seven instances, each fully isolated with its own high-bandwidth memory, cache, and compute cores.Benefits Overview · How The Technology Works · Built For It And DevopsMissing: 2020 | Show results with:2020
  49. [49]
    [PDF] NVIDIA Virtual GPU Software Packaging, Pricing, and Licensing Guide
    Aug 1, 2024 · Version. Date. Authors. Description of Change. 01. April 22, 2020. CH, SM. Initial Release. 02. May 12, 2020. CH, SM.
  50. [50]
    [PDF] NVIDIA Virtual PC (vPC) Sizing Guide
    May 5, 2021 · Note: While the NVIDIA A40 has 48 GB of GPU memory, the maximum vGPUs per A40 GPU is limited to 32 for optimal performance. The NVIDIA A16 ...
  51. [51]
    NVIDIA NGC
    Software from the NGC catalog can be deployed on GPU-powered instances. The software can be deployed directly on virtual machines (VMs) or on Kubernetes ...
  52. [52]
    Using NVIDIA vGPU — NVIDIA GPU Operator - NVIDIA Docs
    You have access to a private registry such as NVIDIA NGC Private Registry and can push container images to the registry. Git and Docker or Podman are ...Prerequisites · Build The Driver Container · Configure The Cluster With...
  53. [53]
    AMD Reveals World's First Hardware-Virtualized GPU Product Line
    Feb 1, 2016 · IT budgets can realize support for up to 16 simultaneous users with a single AMD FirePro S7150 GPU card which features 8 GB of GDDR5 memory, ...
  54. [54]
    Getting started with Virtualization - Instinct™ Docs - AMD
    AMD's MxGPU technology utilizes SR-IOV to allow a single GPU to appear as separate devices on the PCIe bus, presenting virtual functions (VFs) to the operating ...Missing: mechanism | Show results with:mechanism
  55. [55]
    [PDF] AMD MxGPU and VMware Deployment Guide, v2.4
    AMD MxGPU technology uses the Single Root I/O Virtualization (SR-IOV) PCIe® virtualization standard to create up to 16 virtual MxGPUs per physical GPU. The ...
  56. [56]
    Getting started with Virtualization - Instinct™ Docs
    AMD's MxGPU technology utilizes SR-IOV to allow a single GPU to appear as separate devices on the PCIe bus, presenting virtual functions (VFs) to the operating ...Missing: MI | Show results with:MI
  57. [57]
    GPU Partitioning — AMD Instinct Virtualization Driver
    The partitioning is based on the number of VFs enabled, which can be configured as follows: 1 VF (SPX). 8 VFs (CPX). To achieve the desired static compute ...
  58. [58]
    AMD Instinct Virtualization Driver
    AMD's MxGPU technology enables GPU virtualization through SR-IOV (Single Root I/O Virtualization), allowing virtual machines to share AMD accelerator resources ...Missing: MI | Show results with:MI
  59. [59]
    GVTg_Setup_Guide · intel/gvt-linux Wiki - GitHub
    Oct 3, 2024 · Intel GVT-g is a full GPU virtualization solution with mediated pass-through (VFIO mediated device framework based), starting from 5th generation Intel Core(TM ...
  60. [60]
    Graphics Virtualization Technologies Support for Each Intel ...
    To find which graphics virtualization technology is supported on each Intel® graphics family, refer to the table below.
  61. [61]
    GVT-g high-level design - Project ACRN Documentation
    Jun 27, 2019 · GVT-g uses the physical GPU to directly execute all the commands submitted from a VM, so it avoids the complexity of emulating the Render Engine ...
  62. [62]
    Do 11th Generation Intel® Processors Support GVT-g Technology?
    GVT-g technology is not supported. The 11th Generation of Intel® Core Processors works with SR-IOV (Single Root IO Virtualization), that is a new ...
  63. [63]
    Intel GPU | Jellyfin
    This tutorial guides you on setting up full video hardware acceleration on Intel integrated GPUs and ARC discrete GPUs via QSV and VA-API.
  64. [64]
    Intel Gen 12 vGPU (SR-IOV) on Proxmox - GitHub
    This guide is designed to help you virtualize the 12th-generation Intel integrated GPU (iGPU) and share it as a virtual GPU (vGPU) with hardware acceleration.
  65. [65]
    SR-IOV Will Only Be Supported On Intel Arc Pro Graphics Cards
    Aug 13, 2025 · SR-IOV for virtualization with the Intel Xe kernel graphics driver will only be supported on the Arc Pro products and -- unfortunately -- not ...
  66. [66]
    Intel® Data Center GPU Flex Series
    Deliver unmatched flexibility for virtual desktop infrastructure, media processing, and visual AI with the Intel® Data Center GPU Flex Series.Missing: partitioning | Show results with:partitioning
  67. [67]
    Why I stopped using Intel GVT-g on Proxmox with Quick Sync - ktz.blog
    Feb 23, 2021 · Performance via GVT-g is anywhere from 58-82% slower than Quick Sync being using natively on the bare metal host.Missing: 50-80% | Show results with:50-80%
  68. [68]
    Poor performance with Intel GVT-g - Linux - Level1Techs Forums
    Jan 13, 2021 · Unfortunately the performance is anywhere from 20-50% of what the iGPU is capable of. This is using an i5 8500 on Proxmox - here are the ' ...Missing: 50-80% | Show results with:50-80%
  69. [69]
    Virtualization — Gaudi Documentation 1.22.1 documentation
    This document describes how to allocate Intel Gaudi AI accelerator for KVM Guests on Ubuntu 22.04.5 LTS, Ubuntu 24.04.2 LTS and RHEL 9.4. PCI passthrough is ...
  70. [70]
    Intel Gaudi 3 Expands Availability to Drive AI Innovation at Scale
    May 19, 2025 · Available through Dell AI Factory, Intel Gaudi 3 AI accelerators deliver high performance, open source flexibility and enterprise-grade infrastructure to speed ...Missing: IOV | Show results with:IOV
  71. [71]
    Paravirtualization with Mali GPUs - Arm Mali GPU Virtualization Guide
    Paravirtualization with Mali GPUs uses a modified hypervisor, an arbiter, and a kernel driver. The hypervisor reroutes interrupts and remaps registers for the ...
  72. [72]
    AWS Nitro System
    AWS Nitro System is a lightweight hypervisor that provides improved compute and networking performance for EC2 instances.
  73. [73]
    GPU sharing on Amazon EKS with NVIDIA time-slicing and ...
    Sep 12, 2023 · Amazon EKS users can enable GPU sharing by integrating the NVIDIA Kubernetes device plugin. This plugin exposes the GPU device resources to the Kubernetes ...Gpu Concurrency Choices · Create An Eks Cluster With... · Time-Slicing Gpus In Eks
  74. [74]
    Introducing Cloud TPU VMs | Google Cloud Blog
    Jun 1, 2021 · New Cloud TPU VMs let you run TensorFlow, PyTorch, and JAX workloads on TPU host machines, improving performance and usability, and reducing ...
  75. [75]
    Chapter 13. Managing GPU devices in virtual machines | 8
    To access and control GPUs that are attached to the host system, you must configure the host system to pass direct control of the GPU to the virtual machine ( ...Missing: mechanism | Show results with:mechanism
  76. [76]
    Linux with KVM - NVIDIA Docs
    SR-IOV and the manual placement of vGPUs on GPUs in equal-size mode are not supported on GPUs based on the NVIDIA Turing™ architecture. GPU, Mixed vGPU ...
  77. [77]
    GPU virtualisation with QEMU/KVM - Ubuntu Server documentation
    Need native performance: Use PCI passthrough of additional GPUs in the system. You'll need an IOMMU set up, and you'll need to unbind the cards from the host ...Graphics · Preparing The Input-Output... · Preparations For Mediated...
  78. [78]
    libvirt releases
    qemu: Add support for NUMA affinity of PCI devices. To support NVIDIA Multi-Instance GPU (MIG) configurations, libvirt now handles QEMU's acpi-generic ...
  79. [79]
    QEMU version 9.0.0 released
    Apr 23, 2024 · We'd like to announce the availability of the QEMU 9.0.0 release. This release contains 2700+ commits from 220 authors.
  80. [80]
    Chapter 18. Optimizing virtual machine performance | 9
    When using virtio-blk or virtio-scsi storage devices in your virtual machines (VMs), the multi-queue feature provides improved storage performance and ...
  81. [81]
    PCI Passthrough - Proxmox VE
    Sep 3, 2025 · PCI passthrough allows you to use a physical PCI device (graphics card, network card) inside a VM (KVM virtualization only).
  82. [82]
    VMware vSphere - NVIDIA Docs
    Features Deprecated in Release 19.0​​ NVIDIA vGPU software 19 is the last release branch to support the following graphics cards: Tesla M10. Tesla V100 SXM2.Missing: formerly | Show results with:formerly
  83. [83]
    VMware Horizon and vSphere | NVIDIA Virtual GPU
    NVIDIA vGPU solutions, combined with VMware technology, accelerate and secure virtual machines, desktops, and applications in the data center.
  84. [84]
    NVIDIA Virtual GPU (vGPU): VMware vSphere Deployment Guide
    Overview · Supported NVIDIA GPUs · Choosing Your Hardware · General Prerequisites · Server BIOS Settings · VMware vSphere Installation and vGPU Configuration.Selecting the Correct vGPU... · Installing VMware ESXi · NVIDIA License Server...
  85. [85]
    Virtual GPU Software Supported Products - NVIDIA Docs
    Apr 21, 2025 · NVIDIA vGPU software supports only 64-bit guest operating systems. No 32-bit guest operating systems are supported.<|control11|><|separator|>
  86. [86]
    Using vSphere vMotion to Migrate vGPU Virtual Machines - TechDocs
    You can use vMotion to perform a live migration of NVIDIA vGPU-powered virtual machines without causing data loss. To enable vMotion for vGPU virtual machines, ...
  87. [87]
    [PDF] VMware vSphere 7 with NVIDIA AI Enterprise time-sliced vGPU vs ...
    GPU virtualization is managed by the drivers installed inside the VM and the hypervisor. It exposes. vGPUs to VMs and shares a physical GPU across multiple VMs.Missing: AMD | Show results with:AMD
  88. [88]
    What's New - NVIDIA Docs
    Apr 21, 2025 · New Features in Release 18.0. Support for vGPUs with different amounts of frame buffer on the same physical GPU ("mixed-size mode") on legacy ...
  89. [89]
    Partition and share GPUs with virtual machines on Hyper-V
    Jul 24, 2025 · Each VM can access only the GPU resources dedicated to them and the secure hardware partitioning prevents unauthorized access by other VMs.Missing: cons | Show results with:cons
  90. [90]
    Deploy graphics devices by using Discrete Device Assignment
    Feb 19, 2025 · Learn how to use Discrete Device Assignment (DDA) to pass an entire PCIe device into a virtual machine (VM) with PowerShell.Prerequisites · Configure the VM for DDA
  91. [91]
    Partition and assign GPUs to a virtual machine in Hyper-V
    Jan 9, 2025 · This article describes how to configure graphics processing unit (GPU) partitions and assign a partition to a virtual machine (VM).Prerequisites · Verify Gpu Driver... · Assign Gpu Partition To A Vm
  92. [92]
    Microsoft Windows Server Installation and vGPU Configuration
    Jun 2, 2025 · Creating GPU Partitions. To create a GPU partition, start by listing the GPU adapters that support GPU-P by running the following command:.NVIDIA vGPU Configuration · Creating GPU Partitions · Attaching a vGPU to the VM
  93. [93]
    NVIDIA Virtual GPU Software v18.0 through 18.5
    The NVIDIA virtual GPU software management SDK enables third party applications to monitor and control all NVIDIA physical GPUs and virtual GPUs that are ...Supported Products · Virtual GPU Software User Guide · VMware vSphere
  94. [94]
    Manage GPUs using partitioning for Azure Local (preview)
    Aug 29, 2025 · This article describes how to manage GPUs using partitioning (GPU-P) for Azure Local virtual machines (VMs) enabled by Azure Arc. GPU-P allows ...
  95. [95]
    NVIDIA Virtual GPU Solutions
    ### NVIDIA vGPU for VDI Summary
  96. [96]
    Selecting the Right NVIDIA GPU for Virtualization
    May 14, 2025 · This document provides guidance on selecting the optimal combination of NVIDIA GPUs and virtualization software specifically for virtualized ...
  97. [97]
    Best Virtual Workstations for 2025: Power, Flexibility & Remote Access
    NVIDIA RTX technology enables virtual workstations to handle real-time ray tracing and image processing with low latency, allowing professionals to work ...Big Ideas Need Bigger... · Powered By Nvidia Rtx... · Use Cases And Success...
  98. [98]
    VMware Horizon Blast Extreme Acceleration with NVIDIA GRID
    Feb 25, 2016 · Using VMware Horizon Blast Extreme with NVIDIA GRID decreased the overall latency by 51ms in comparison to PCoIP and by 27ms compared to VMware ...Latency · Frames Per Second (fps) · Host Cpu Load And Users Per...<|separator|>
  99. [99]
    How Virtual GPUs Work, Use Cases & Critical Best Practices
    Jan 1, 2025 · Cost efficiency: By enabling resource sharing, vGPUs eliminate the need for dedicated GPUs for each VM. This reduces hardware expenses and ...
  100. [100]
    NVIDIA AI Enterprise and NVIDIA vGPU (C-Series)
    NVIDIA Virtual GPU (C-Series) accelerates AI and ML workloads by enabling multiple virtual machines to have simultaneous, direct access to a single physical GPU ...
  101. [101]
    MIG Support in Kubernetes - NVIDIA Docs
    Nov 25, 2024 · This section walks through the steps necessary to deploy and run the k8s-device-plugin and gpu-feature-discovery components for the various MIG ...
  102. [102]
    Accelerate AI & Machine Learning Workflows | NVIDIA Run:ai
    Learn how AI-native workload orchestration maximizes GPU efficiency, streamlines AI infrastructure management, and scales AI workloads seamlessly across hybrid ...
  103. [103]
    Install Kubernetes device plugin for GPUs - Amazon EKS
    Oct 3, 2025 · The following procedure describes how to install the NVIDIA Kubernetes device plugin and run a sample test on NVIDIA GPU instances.
  104. [104]
    NVIDIA device plugin for Kubernetes - GitHub
    The NVIDIA device plugin for Kubernetes is a Daemonset that allows you to automatically: Expose the number of GPUs on each nodes of your cluster ...
  105. [105]
    GPU Virtualization - Volcano
    Jun 24, 2025 · Volcano addresses this by providing robust virtual GPU (vGPU) scheduling capabilities, facilitating efficient sharing of physical GPUs among ...Background Knowledge of... · Dynamic MIG (Hardware-level... · Installation
  106. [106]
    Maximizing GPU Utilization using NVIDIA Run:ai in Amazon EKS
    Jun 9, 2025 · Run:ai's fractional GPU technology solves challenges like static allocation, resource competition, and inefficiency in shared GPU clusters.Maximizing Gpu Utilization... · Walkthrough · Understanding Run:Ai...Missing: virtualization | Show results with:virtualization
  107. [107]
    Practical Tips for Preventing GPU Fragmentation for Volcano ...
    Mar 31, 2025 · Here's a detailed look at the problem, our approach, and the results. Problem: GPU fragmentation and scheduling Inefficiencies. The DGX Cloud ...
  108. [108]
    Confidential Computing on Nvidia H100 GPU - Phala Network
    Sep 5, 2024 · This report evaluates the performance impact of enabling Trusted Execution Environments (TEE) on NVIDIA H100 GPUs for large language model (LLM) ...Missing: vGPU | Show results with:vGPU
  109. [109]
    Confidential Computing on NVIDIA H100 GPUs for Secure and ...
    Aug 3, 2023 · The NVIDIA H100 Tensor Core GPU is the first ever GPU to introduce support for confidential computing. It can be used in virtualized environments.
  110. [110]
    Data Center GPU Industry worth $228.04 billion by 2030
    May 15, 2025 · The global data center GPU market is projected to reach USD 119.97 billion in 2025 and USD 228.04 billion by 2030, registering a CAGR of 13.7% during the ...