VMDK
The Virtual Machine Disk (VMDK) format is an open file format specification developed by VMware for representing virtual hard disk drives in virtual machines, allowing guest operating systems to interact with them as standard physical disks while storing data in files on the host system's storage.[1] It enables efficient virtualization by supporting dynamic allocation of storage space and compatibility across VMware products such as vSphere, Workstation, and ESXi.[2] A VMDK virtual disk is structured as a descriptor file—typically named with a.vmdk extension—that contains metadata describing the disk's geometry, capacity (measured in 512-byte sectors), and layout, paired with one or more extent files that hold the raw data.[1] Extents can be sparse (growable, allocating space on demand to support thin provisioning), flat (preallocated for fixed-size disks), or device-backed (mapping directly to physical storage), and they may be monolithic or split into smaller files (e.g., up to 2 GB each) for easier transfer.[2] This modular design facilitates features like snapshots through delta links, where changes are stored in child disks without altering the base, enhancing backup, cloning, and recovery processes in virtual environments.[1]
Introduced in early VMware products like Workstation 4 and ESX Server 3, the VMDK format has evolved through versions (e.g., up to version 2 by 2007, with version 3 introduced in 2009) to include support for larger capacities and advanced storage options, remaining a cornerstone of VMware's ecosystem in modern releases like vSphere 8.0.[1][2][3] Its openness allows interoperability with third-party tools, though files must be handled carefully to avoid corruption, often requiring VMware-specific utilities for mounting or extraction.[2]
Introduction
Definition and Purpose
VMDK, which stands for Virtual Machine Disk, is a file-based format that represents virtual hard disk drives for use in virtual machines.[1][4] It functions as a container for the complete storage requirements of a virtual machine, encompassing the operating system, applications, and user data, all stored within one or more files on the host system's filesystem.[1][5] The primary purpose of the VMDK format is to enable efficient virtualization by mimicking physical disk behavior in software environments, allowing virtual machines to operate seamlessly as if connected to standard hardware drives.[1][4] This encapsulation supports key virtualization workflows, such as deploying and managing isolated computing environments without dedicated physical storage.[5] Key characteristics of VMDK include support for both monolithic configurations, where the entire disk resides in a single file, and split configurations, which divide the disk into multiple smaller files for easier handling or transfer.[1][4] It also facilitates advanced features like snapshots, which capture disk states for backup or testing, and cloning, which duplicates virtual disks for rapid VM replication.[1][5] These capabilities are available alongside provisioning options such as thin or thick allocation to optimize storage usage.[4] Within the broader virtualization landscape, VMDK is a core component of the VMware ecosystem, powering products like vSphere and Workstation, yet it has been openly specified since 2011 to promote interoperability with other platforms, including Oracle VirtualBox and QEMU.[1][5] This standardization ensures that VMDK files can be used across diverse hypervisors, enhancing portability in multi-vendor environments.[4][5]Development History
The VMDK (Virtual Machine Disk) format originated in the late 1990s, developed by VMware as a core component of its pioneering virtualization software, including VMware Workstation, which was first released on May 15, 1999. This initial implementation provided a container for virtual hard disk images, enabling the simulation of physical storage within virtual machines on hosted hypervisors.[6][7] As a proprietary format in its early years, VMDK evolved through internal updates to support growing virtualization demands, but a pivotal shift occurred in 2008 when VMware collaborated with the Distributed Management Task Force (DMTF) to include VMDK as a supported disk format in the Open Virtualization Format (OVF) specification, with OVF version 1.0 released in 2009. The detailed VMDK specification was openly released by VMware on December 20, 2011 (revision 5.0), promoting full interoperability across virtualization platforms.[8][9] Key version milestones marked VMDK's technical progression: Version 1 (1999), the initial version supporting basic flat and sparse disk structures; Version 2, added in the mid-2000s, introduced support for disk encryption in hosted products; and Version 3, introduced in 2009 with ESX 4.0, added changed block tracking for efficient incremental backups and replication. These updates aligned with VMware's product releases, such as ESX Server advancements.[10] In 2023, Broadcom completed its acquisition of VMware on November 22, assuming maintenance responsibilities for VMDK as part of the broader ecosystem. As of 2025, the format remains actively supported, ensuring compatibility with vSphere 8 and later versions for ongoing virtual machine deployments.[11][12]Technical Overview
File Components
A VMDK virtual disk consists of multiple files that together form the complete disk image, with the core components being a descriptor file and one or more data files. The descriptor file, typically named with a .vmdk extension (e.g., vmname.vmdk), is a text-based metadata file that defines the overall structure, including the disk's geometry, adapter type, and references to the data files. It specifies the total size of the virtual disk and links to the extents where the actual data resides.[1] Data files store the raw contents of the virtual disk and vary based on the configuration. For preallocated or dense disks, a single flat data file (e.g., vmname-flat.vmdk) holds all the data in a contiguous format, providing efficient access similar to a physical drive. In contrast, split configurations divide the data into multiple smaller files, such as vmname-s001.vmdk, vmname-s002.vmdk, and so on, each limited to a maximum of 2 GB to accommodate file systems with size restrictions. Monolithic disks use a single data file for simplicity and performance on systems without such limits, while split disks facilitate easier transfer and management across networks or storage with constraints.[1][13] Additional files support specific operations like snapshots and concurrency control. Snapshot files, often named with a -delta.vmdk suffix (e.g., vmname-000001-delta.vmdk), capture changes to the disk after a snapshot is taken, operating in a copy-on-write manner to preserve the original data. Lock files, with a .lck extension (e.g., vmname.vmdk.lck), are created to prevent concurrent access and ensure data integrity during virtual machine operations, indicating an active session or host ownership.[1][14] The overall architecture distinguishes between grain-based extents for sparse disks, which allocate space dynamically in fixed-size grains (typically 64 KB) to support efficient growth, and flat extents for dense disks, which preallocate the full capacity upfront. The descriptor file centralizes the definition of the total disk size and extent types, enabling the virtual disk to emulate a standard block device to the guest operating system.[1]Versioning
The VMDK file format has three versions, with each subsequent version building on the previous ones to add advanced features while ensuring compatibility with earlier implementations. The version is specified in the descriptor file using a line such as "version=1", "version=2", or "version=3". Virtual disk tools and hypervisors must support all versions up to 3 to handle legacy and modern VMDK files without issues.[10] Version 1 is the foundational version of the VMDK format, offering basic support for flat, preallocated virtual disks without advanced provisioning or tracking capabilities. It was introduced around 2000 with early VMware products, including the initial releases of VMware ESX Server. This version remains fully supported for reading and writing in all current VMware tools, such as vixDiskLib.[10] Version 2 extended the format by adding support for disk encryption, primarily for hosted virtualization products like VMware Workstation and VMware Fusion. Introduced approximately in 2005, this version enables secure virtual disks in desktop environments, though encrypted VMDK files are treated as version 1 on ESX servers, where encryption is not implemented. Version 2 files can be transferred to ESX and function as unencrypted version 1 disks, ensuring broad interoperability.[10] Version 3, the current standard since around 2010, introduced persistent Changed Block Tracking (CBT) to efficiently identify modified disk blocks for backups and replication. First appearing in ESX 4.0, it requires VMFS datastores and is essential for advanced vSphere 6 and later features like optimized data protection. The version field changes to 3 when CBT is enabled and reverts to 1 when disabled; the descriptor includes a "changeTrackPath" line pointing to the CBT file (e.g., *-ctk.vmdk). This version also supports sparse provisioning with grain directories for thin disks and multi-extent configurations, along with compression and encryption enhancements.[10] Backward compatibility is a core design principle, with modern hypervisors and tools required to handle version 1 files seamlessly, often by ignoring or emulating higher-version features. As of 2025, no versions beyond 3 have been released, and version 3 remains the maximum supported in vSphere environments, with legacy version 1 support preserved for older ESX deployments.[10]Format Specification
Descriptor File
The VMDK descriptor file is a plain text file that contains essential metadata for interpreting and accessing the virtual disk's structure and contents. It serves as the primary entry point for hypervisors and virtualization software, allowing them to locate associated data files, verify disk integrity, and determine access parameters without directly examining the binary data extents. The file is case-insensitive and uses a simple line-based format, where lines beginning with a hash mark (#) are treated as comments, and the rest consist of key-value pairs or structured declarations separated by sections. This format enables easy parsing and manual editing when necessary, such as during recovery operations.[1] The descriptor file is organized into three main sections: a header with core identifiers, an extents description listing data file references, and a disk database for additional configuration details. In the header, theversion field specifies the VMDK format version, typically set to 1 for basic compatibility or 3 when features like persistent Changed Block Tracking (CBT) are enabled on VMFS datastores; version 3 includes an optional changeTrackPath field pointing to a *-ctk.vmdk file for tracking modified blocks. The CID (Content ID) is a 32-bit hexadecimal value that uniquely identifies the disk and changes upon first modification to ensure consistency checks, while parentCID references the parent's CID (or ffffffff for root disks) to support snapshot chains. The createType field indicates the provisioning method used during creation, such as "vmfs" for VMFS-based sparse disks, "monolithicSparse" for single-file sparse disks, or "twoGbMaxExtentSparse" for multi-file sparse layouts limited to 2 GB per extent. For snapshot or delta disks, the parentFileNameHint provides the relative path to the parent descriptor file, facilitating chain resolution. These fields collectively enable the software to reconstruct the disk hierarchy and validate linkages.[10][1][15]
The extents section lists all data files (extents) that comprise the virtual disk, with each line following the syntax: access_mode sector_count extent_type "filename" [offset]. The access_mode is either "RW" for read-write or "RO" for read-only, followed by the number of sectors (each 512 bytes) in the extent. Extent types include "FLAT" for preallocated raw data files, "SPARSE" for growable files with metadata for unallocated areas (often using grain tables), or "VMFSSPARSE" for VMFS-optimized sparse extents in snapshots. For flat extents, an optional offset parameter specifies the starting byte position within the file, as in the example line RW 63 FLAT "disk-flat.vmdk" 0, which defines a read-write flat extent of 63 sectors from the named file beginning at offset 0. Multiple extents can be declared for multi-file disks, such as those split for legacy 2 GB limits. This section directly maps logical disk addresses to physical file locations and types.[1][15]
The disk database section, marked by a #DDB comment, stores VMware-specific configuration as key-value pairs prefixed with ddb., such as adapter type (ddb.adapterType = "ide" or "lsiLogic"), virtual hardware version, and disk geometry for legacy BIOS compatibility. Geometry fields include ddb.geometry.cylinders, ddb.geometry.heads, and ddb.geometry.sectors (commonly 16 heads and 63 sectors per track, with cylinders calculated from total size), which emulate physical CHS addressing for older operating systems. Advanced fields may include ddb.longContentID for an extended 64-byte UUID-like identifier used in content-based checksumming, and for sparse extents in version-compatible disks, a compression algorithm reference such as DEFLATE (per RFC 1951) to reduce storage for grain data. The encoding="UTF-8" declaration, often present in version 3 files, ensures proper handling of filenames with international characters. Hypervisors parse the descriptor first to mount extents, apply access flags, and initialize the virtual disk for I/O operations, ensuring seamless integration across VMware environments.[1][10][15]
Data Files and Extents
In the VMDK format, extents represent logical divisions of the virtual disk, where each extent points to a physical file or a specific range within a file or device, such as flat files, sparse files, or raw device mappings (RDM). Note that grain sizes and structures differ between hosted products (e.g., Workstation, using SPARSE with 64 KB grains) and server environments (e.g., ESXi, using VMFSSPARSE with 512-byte grains for optimized thin provisioning and snapshots).[16] These extents enable the virtual disk to be composed of multiple storage components, facilitating flexible data organization without embedding all metadata in a single file.[16] For sparse extents, which support dynamic allocation, data is organized into grains of 64 KB (65,536 bytes), equivalent to 128 sectors of 512 bytes each (default for hosted sparse extents).[16] A grain directory containing pointers to multiple grain tables (variable number based on total size), where each grain table is an array of 512 pointers to individual grains, maps virtual blocks to physical locations by indexing the location of allocated data blocks within the extent file.[16] Unallocated grains are represented by zeroed entries, allowing the hypervisor to return zeros on reads or copy data from parent disks in snapshot chains until a write triggers allocation.[16] Flat extents, in contrast, consist of a contiguous binary data file, typically named with a "-flat.vmdk" suffix, containing no internal metadata for block mapping.[16] Sectors in a flat extent map directly to file offsets, with the virtual disk's logical block address (LBA) translating one-to-one to the physical offset in bytes (LBA × 512).[16] This preallocated structure ensures efficient sequential access but requires the full disk capacity to be reserved upfront.[16] The hypervisor accesses data by translating the virtual LBA through the descriptor file to the appropriate extent offset.[16] For sparse extents, this involves computing the grain number as \lfloor \frac{\text{LBA}}{128} \rfloor (using default grain size in sectors), the grain table index as \lfloor \frac{\text{grain number}}{512} \rfloor, and the position in the grain table as grain number \mod 512 to locate the grain offset.[16] Flat extents bypass this indirection, using direct arithmetic for offset calculation.[16] Multiple extents within a single virtual disk have been supported since version 1.Provisioning Types
Thin Provisioning
Thin provisioning in the VMDK format enables the dynamic allocation of storage space, where a virtual disk initially consumes only a minimal amount of host storage—typically just the metadata and header information—and expands on demand as the virtual machine writes data to it. This approach uses sparse files on the host file system (VMFS in ESXi environments), avoiding the pre-allocation of the full provisioned capacity.[17][18] In vSphere/ESXi, the implementation relies on the VMFS file system's support for thin-provisioned files, with the VMDK descriptor specifying the "VMFS" extent type and "thin" provisioning. The data file (e.g., disk-flat.vmdk) is created as a sparse file that grows incrementally as blocks are written, returning zeros for unread areas without allocating physical space until needed. This feature has been supported since ESX Server 3.0 (2007), using VMDK format version 1. For snapshot delta disks and in hosted products like VMware Workstation, a "SPARSE" extent type is used instead, organizing data into grains—fixed-size blocks with a default size of 64 KB (128 sectors of 512 bytes each)—managed via grain directories and tables (primary and secondary for redundancy) that map allocated regions. Unallocated grains are marked with zero entries and filled with zeros on first write.[17][18][1][19] Key advantages include highly efficient storage utilization, allowing for overprovisioning in shared datastores where multiple virtual machines can be allocated more space than physically available, as actual usage determines consumption. For example, a 40 GB thin-provisioned disk might initially occupy only 2 GB, enabling rapid virtual machine deployment since creation involves minimal I/O overhead compared to pre-allocating full space. This makes it particularly suitable for environments with variable workloads and abundant storage capacity.[17][18] However, thin provisioning introduces potential drawbacks, such as performance overhead during initial writes due to the need for on-the-fly block allocation and SCSI reservations, which can lead to contention in high-I/O scenarios. Additionally, without proper monitoring, unchecked growth risks datastore exhaustion, potentially causing virtual machine failures if physical storage is depleted before alerts are addressed. Once space is allocated to a block, it cannot be reclaimed automatically, though tools like vmkfstools with the -K option can punch holes in the file for space recovery after guest OS deletion (supported since vSphere 5.0 with UNMAP).[17][20] Configuration of thin provisioning occurs during virtual disk creation in VMware environments, specified via the vSphere Client or command-line tools like vmkfstools with the-d thin option (e.g., vmkfstools -c 10G -d thin disk.vmdk). Existing thick disks can be converted to thin using vmkfstools -i source.vmdk destination.vmdk -d thin, which clones and reprovisions the extent while preserving data. The resulting descriptor file explicitly declares the thin nature, ensuring compatibility with ESXi hosts supporting VMFS or NFS datastores.[17][21]
Thick Provisioning Variants
Thick provisioning in the VMDK format preallocates the entire virtual disk capacity on the host storage system during creation, utilizing flat extents to provide predictable performance by eliminating runtime allocation overhead.[22] This approach contrasts with thin provisioning by reserving space immediately, reducing the risk of overcommitment failures in dense environments.[23] The lazy zeroed variant of thick provisioning allocates the full disk space at creation but defers zeroing of data blocks until the first write operation to each block.[22] This results in quicker disk creation times, as the zeroing process—intended to overwrite any residual data from previous uses—is performed lazily on demand.[23] However, initial writes may incur higher latency and lower throughput due to this on-the-fly zeroing, though subsequent operations achieve performance comparable to other thick formats.[23] In environments with VAAI-capable storage arrays, hardware offloading can mitigate these initial performance penalties for lazy zeroed disks.[23] Eager zeroed thick provisioning, in contrast, allocates the disk space and proactively zeros all blocks during creation, ensuring no residual data exposure and eliminating zeroing delays for all writes.[22] While this extends provisioning time—potentially significantly for large disks—it delivers superior first-write performance, making it ideal for latency-sensitive applications.[23] This variant is mandatory for vSphere Fault Tolerance, where synchronized secondary VMs require fully zeroed disks to maintain data integrity in continuous mirroring scenarios.[20] In the VMDK descriptor file, both thick provisioning variants employ flat extents described with the "FLAT" type, specifying read-write (RW) access, extent size in sectors, the backing data file (e.g., -flat.vmdk), and an offset of 0 for preallocated layouts.[1] Creation and management occur via the vmkfstools utility, using the -c option with -d eagerzeroedthick for eager zeroed disks or -d lazyzeroedthick for lazy zeroed ones, alongside parameters for size and datastore path.[24] Resizing eager zeroed disks via vmkfstools preserves the format with the --eagerzero flag, though GUI extensions may revert portions to lazy zeroed.[25] Lazy zeroed thick disks suit general-purpose virtual machines, such as development or testing environments, where rapid deployment outweighs minor initial I/O overhead.[23] Eager zeroed disks are preferred for I/O-intensive workloads like databases or real-time applications, as well as clustered setups including vSphere Fault Tolerance or high-availability configurations in vSphere HA, ensuring consistent performance and security.[23][20]Compatibility and Usage
VMware Integration
VMDK files serve as the primary virtual disk format across VMware's core products, including the vSphere platform with its ESXi hypervisor, as well as desktop hypervisors like VMware Workstation and VMware Fusion. In vSphere environments, VMDK files are stored on VMFS or NFS datastores, enabling efficient management of virtual machine storage in clustered setups. These files encapsulate the virtual hard disk data and metadata, allowing seamless integration with ESXi hosts for running workloads on shared or local storage infrastructures. Workstation and Fusion utilize VMDK for local virtual machines, supporting compatibility with vSphere through file import and export functionalities. Management of VMDK files within VMware products occurs via graphical and command-line tools. The vSphere Client facilitates VMDK creation during virtual machine deployment and supports conversions, such as transforming thin-provisioned disks to thick-provisioned ones by inflating them to full capacity. For advanced operations, the vmkfstools command-line utility, available on ESXi hosts, enables cloning of VMDK files to create duplicates, inflating thin disks to eager zeroed thick format for performance optimization, and shrinking sparse disks by reclaiming unused space after guest-level defragmentation. Key features enabled by VMDK integration include snapshots, linked clones, and live migrations. Snapshots preserve virtual machine states by generating delta VMDK files that capture changes since the snapshot point, allowing non-disruptive backups or testing without altering the base disk. Linked clones, built on snapshot technology, share the parent VMDK's base layers while using delta files for unique changes, optimizing storage in scenarios like virtual desktop infrastructure. vMotion supports live migration of running virtual machines, including seamless transfer of VMDK files via Storage vMotion to relocate disks between compatible datastores without downtime. Best practices for VMDK usage emphasize alignment with datastore block sizes to ensure optimal I/O performance and avoid fragmentation on VMFS volumes. Administrators are advised to enable Changed Block Tracking (CBT) on virtual machines to facilitate efficient incremental backups by identifying only modified blocks in VMDK files, reducing backup windows and storage overhead. Following Broadcom's acquisition of VMware in late 2023, VMDK support has continued uninterrupted in vSphere 8.0 and later versions, with enhancements to virtual machine encryption that extend native and vSphere Native Key Provider (vNKP) protections to VMDK files for improved data security in transit and at rest.Third-Party Support
VMDK files enjoy broad compatibility with third-party hypervisors, enabling read and write operations in environments outside VMware ecosystems. Oracle VM VirtualBox provides full read/write support for VMDK images, including dynamic allocation and differencing for snapshots, through its VBoxManage command-line tool and graphical interface, allowing seamless attachment as virtual hard disks.[26] Similarly, QEMU supports VMDK as a disk image format via the-drive format=vmdk option, accommodating VMware versions 3 and 4 with subformats like monolithic sparse and two gigabyte maximum extent for handling larger files.[27] In contrast, Microsoft Hyper-V offers only partial support, limited to importing VMDK files through conversion tools or System Center Virtual Machine Manager, as it does not natively execute VMDK without transforming it to VHD or VHDX formats.[28]
Several conversion utilities facilitate VMDK interoperability across platforms. The qemu-img tool, part of the QEMU suite, enables direct conversion of VMDK files to formats like QCOW2 for KVM or VHD for Hyper-V, preserving data integrity during migrations without requiring full VM exports.[29] StarWind V2V Converter similarly supports VMDK as both source and target, allowing cross-format migrations to VHD/VHDX, QCOW2, or IMG/RAW, with options for thin and thick provisioning to optimize storage.[30]
Despite this compatibility, third-party tools exhibit limitations with advanced VMDK features. Many implementations, including QEMU and VirtualBox, align with VMDK specification up to version 3, handling basic flat and sparse extents but lacking full support for some advanced features beyond basic extents, such as certain snapshot configurations. Multi-extent configurations, useful for splitting large disks, receive partial handling via subformats like twoGbMaxExtentSparse, but complex chaining or snapshots on VMDK often require conversion to native formats like QCOW2 to avoid errors.[27]
The adoption of VMDK within the Open Virtualization Format (OVF) standard since 2008 has enhanced its portability, packaging VMs with VMDK disks for distribution across heterogeneous environments.[8] This enables direct import into public clouds, such as Amazon Web Services, where VM Import/Export accepts VMDK images to create EC2 instances or AMIs.[31] Google Cloud Compute Engine similarly supports VMDK imports via the gcloud compute images import command, converting them to persistent disks for scalable VM deployment.
As of 2025, Proxmox VE versions 8 and later, including the 9.0 release, have improved VMDK handling through enhanced import workflows and snapshot capabilities. The qm importdisk command now better supports VMDK migration by converting to QCOW2 or raw formats, with Proxmox VE 9.0 introducing volume-chain snapshots on thick-provisioned LVM storage, allowing consistent backups of imported VMDK-based VMs without full reconfiguration.[32]