Google Compute Engine
Google Compute Engine is an infrastructure-as-a-service (IaaS) offering from Google Cloud Platform that enables users to create and manage virtual machine (VM) instances and bare metal servers on Google's global data center infrastructure.[1] It provides scalable compute resources, allowing customers to run diverse workloads such as web applications, databases, batch processing, and high-performance computing without managing underlying hardware.[2] Launched in preview on June 28, 2012, and achieving general availability on December 2, 2013, Compute Engine utilizes a KVM-based hypervisor to deliver reliable, self-managed virtual machine instances with options for Linux and Windows operating systems, while bare metal servers provide direct hardware access without virtualization.[3][4][5] Compute Engine supports a variety of machine types tailored to specific needs, including general-purpose (e.g., N1, E2), compute-optimized (e.g., C2), memory-optimized (e.g., M2), and accelerator-optimized instances equipped with GPUs, Google's custom Tensor Processing Units (TPUs), or Arm-based processors like Axion for AI and machine learning tasks.[1] Users can customize configurations for vCPUs, memory, and storage, with options like Persistent Disk for block storage, Local SSD for high-performance temporary storage, and Hyperdisk for advanced throughput.[6] The service guarantees 99.9% uptime for most instances and 99.95% for memory-optimized VMs, featuring live migration to minimize downtime during maintenance.[2] Integrated seamlessly with other Google Cloud services, Compute Engine facilitates container orchestration via Google Kubernetes Engine (GKE), data analytics with BigQuery, and storage with Cloud Storage, enabling hybrid and multi-cloud architectures.[1] Pricing models include pay-as-you-go, Spot VMs for up to 91% discounts on interruptible workloads, and committed use discounts for predictable savings, with a free tier offering one e2-micro instance monthly.[2] Available across 42 regions and 127 zones worldwide as of 2025, it emphasizes security through features like shielded VMs, customer-managed encryption keys, and compliance with standards such as SOC, PCI DSS, and HIPAA.[7]History
Launch and Early Development
Google Compute Engine was announced on June 28, 2012, during the Google I/O developer conference as a limited preview service within the Google Cloud Platform (GCP). This launch marked Google's entry into the Infrastructure as a Service (IaaS) market, offering users the ability to provision and manage virtual machines (VMs) on its global infrastructure without the need to handle underlying hardware.[8] The service was positioned to compete with offerings like Amazon EC2, emphasizing Google's strengths in scalability, performance, and cost-efficiency, with claims of providing 50% more compute power per dollar compared to competitors. At launch, Google Compute Engine focused on delivering KVM-based virtual machines primarily for Linux operating systems, enabling developers and businesses to run large-scale workloads such as web applications, batch processing, and data analysis.[9] Initial VM configurations supported up to 8 virtual CPUs and 3.75 GB of RAM per core, with persistent block storage for data durability.[10] Key early integrations with other GCP services, such as Google Cloud Storage, allowed users to store and access unstructured data directly from VMs, facilitating seamless workflows for applications requiring object storage alongside compute resources. Access to the limited preview required sign-up and was initially restricted to selected developers, with no public pricing or general availability timeline disclosed. In early 2013, the service transitioned from limited preview to a broader beta phase, ending the free trial period and requiring users to provide credit card details for continued access.[11] By May 2013, Google opened the beta to all users via the Google Cloud Console, expanding availability and introducing initial machine type offerings like the n1-standard series, which balanced CPU and memory for general-purpose workloads (e.g., n1-standard-1 with 1 vCPU and 3.75 GB RAM).[12][13] During this beta period, support for additional Linux distributions grew, and foundational features like live migration for maintenance were tested to ensure high availability. Windows support was introduced in limited preview later in development, broadening OS compatibility.[14] The service reached general availability on December 2, 2013, with a 99.95% monthly uptime SLA, 24/7 support, and reduced pricing to encourage broader adoption.[4] This milestone solidified Google Compute Engine's role in GCP, transitioning it from experimental preview to a production-ready IaaS platform capable of supporting enterprise-scale deployments.[15]Major Milestones and Updates
Following its general availability in 2013, Google Compute Engine saw key enhancements in operating system support and pricing models. In April 2014, the service introduced sustained use discounts, which automatically apply up to a 30% reduction for instances running more than 25% of a billing month, optimizing costs for long-running workloads without requiring commitments.[16] Windows Server support launched in limited preview that same year, enabling users to run Microsoft workloads on the platform, with expanded capabilities—including license mobility for existing on-premises licenses—announced on December 8, 2014.[14] Compute options diversified further in 2015 and 2016 to address interruptible and accelerated workloads. Preemptible virtual machines (now known as Spot VMs) debuted in beta on May 18, 2015, offering up to 70% discounts compared to on-demand pricing for batch jobs tolerant of interruptions (with current Spot VMs offering up to 91% discounts), and achieved general availability in September 2015.[17] Initial GPU-accelerated instances were announced on November 16, 2016, powered by NVIDIA Tesla K80 cards, and became available worldwide in early 2017 to support machine learning, data analytics, and high-performance computing tasks.[18] Infrastructure growth accelerated through the late 2010s, with regions and zones expanding to enhance global availability and reduce latency. By mid-2020, Google Cloud had grown to 24 regions across 73 zones in 17 countries, up from just a handful at launch, facilitating broader adoption for distributed applications.[19] Integration with AI and machine learning advanced notably in 2020, when Confidential Computing launched with Confidential VMs on Compute Engine; these use hardware-based trusted execution environments to encrypt data in use, protecting sensitive AI/ML models and processing without performance overhead.[20] Recent updates from 2024 to 2025 emphasize performance for AI-driven and specialized workloads. In July 2024, Hyperdisk ML entered general availability as a high-throughput block storage option tailored for machine learning, delivering up to 1,200,000 MBps read throughput per volume to accelerate data loading for training pipelines across up to 2,500 attached VMs.[21] September 2025 brought general availability of Flex-start VMs, which support short-duration tasks up to seven days using a flexible provisioning model that consumes Spot quota for cost savings on bursty or experimental workloads.[22] The G4 accelerator-optimized machine series followed in October 2025, featuring NVIDIA RTX PRO 6000 Blackwell GPUs for graphics-intensive applications like virtual desktops and Omniverse simulations, available in multiple regions with low-latency networking.[23] November 2025 marked further hardware innovations, with the N4D VM series achieving general availability on November 7, powered by fifth-generation AMD EPYC Turin processors and offering up to 96 vCPUs, 768 GB of DDR5 memory, and Titanium I/O for general-purpose tasks in regions like us-central1.[21] On November 6, the N4A series entered preview, utilizing Google's custom Axion processors based on Arm Neoverse N3 architecture, with configurations up to 64 vCPUs and 512 GB DDR5 for efficient, scalable AI inference and web serving in limited regions such as us-central1 and europe-west3.[21] These developments underscore ongoing efforts to balance cost, performance, and security in cloud computing.Overview and Core Concepts
Virtual Machine Instances
A virtual machine (VM) instance in Google Compute Engine is a self-managed virtual server that runs on Google's infrastructure using a KVM-based hypervisor, allowing users to deploy and operate workloads on customizable compute resources.[6][1] These instances support both Linux and Windows operating systems and can be configured for a wide range of applications, from web servers to high-performance computing tasks.[6] The lifecycle of a Compute Engine VM instance progresses through distinct states, including provisioning (where resources are allocated), running (when the instance is active and operational), stopping (where the instance is shut down but resources are preserved), and terminating (where the instance is deleted and resources are released).[24] Users can monitor and manage these states to ensure efficient resource utilization and application availability throughout the instance's duration.[24] Instances are created through the Google Cloud Console for a graphical interface, the gcloud CLI for command-line automation, or the Compute Engine API for programmatic integration, with key steps involving selection of a machine type, bootable image, and deployment zone.[25][26] This process enables rapid deployment tailored to specific workload requirements, such as compute capacity and geographic placement.[25] For scalable deployments, Compute Engine supports instance groups, which manage collections of identical VMs; managed instance groups (MIGs) provide advanced features like automatic healing, rolling updates, and autoscaling based on metrics such as CPU utilization or custom load balancing.[27][28] MIGs ensure high availability by distributing instances across multiple zones and dynamically adjusting group size to match demand.[27] In September 2025, Google introduced Flex-start VMs in general availability, a feature for single-instance deployments with runtime limits up to seven days, optimized for bursty workloads like AI training or batch processing through a queuing system that improves resource access efficiency.[22][29] Compute Engine also offers bare metal instances, which provide direct hardware access without virtualization overhead, catering to low-latency applications such as financial trading or real-time analytics that require maximal performance and minimal interference.[5][2] VM instances can attach to persistent storage options for durable data management, with details on these attachments covered in dedicated storage sections.[6]Basic Resource Units
In Google Compute Engine, the fundamental resources are measured primarily in terms of virtual CPUs (vCPUs) and gigabytes (GB) of memory, which form the core building blocks for virtual machine instances. Historically, Google introduced the Google Compute Engine Unit (GCEU) as an abstraction for CPU capacity, where 2.75 GCEUs represented the compute power equivalent to one logical CPU core on an n1-standard-1 instance; however, this metric has been largely superseded in modern usage by direct vCPU and memory allocations for simplicity and alignment with hardware capabilities. A vCPU in Compute Engine represents a single hardware hyper-thread (or thread) on the underlying physical processors, which include Intel Xeon Scalable, AMD EPYC, and Arm-based (Tau) CPUs. By default, simultaneous multithreading (SMT, also known as hyper-threading) is enabled, allowing two vCPUs to share one physical core, thereby providing efficient resource utilization without dedicating full cores unless specified otherwise via configuration options.[30][31] vCPUs can be allocated from 1 up to 384 per instance, depending on the machine type and series, with the exact mapping to physical hardware determined by the selected CPU platform.[32] Memory is allocated in increments of GB and is closely tied to vCPU counts, with predefined ratios varying by machine family to balance performance needs. For general-purpose standard machine types, the typical ratio is 4 GB of memory per vCPU, though ranges can extend from 3 to 7 GB per vCPU; specialized families like high-memory types offer up to 24 GB per vCPU, while high-CPU types provide as low as 0.9 GB per vCPU to prioritize processing power.[33] Custom allocations allow flexibility within these bounds, ensuring memory scales proportionally to computational demands. Disk resources are provisioned as block storage in GB, with Persistent Disks serving as the primary unit for durable, scalable storage attached to instances; quotas limit total disk size per region, with default limits varying by project and often starting in the terabyte range for standard Persistent Disk, though these can encompass both SSD and HDD variants.[34] Network bandwidth is another key allocatable unit, measured in Gbps for ingress and egress; while ingress is unlimited, egress bandwidth is capped per instance based on machine type—ranging from 1 Gbps for small instances to 200 Gbps for high-performance series—with premium Tier_1 networking options enabling higher sustained throughput for data-intensive workloads.[35] Compute Engine enforces quotas to manage resource availability, with default limits applied per project and region to prevent overuse; for example, the standard CPU quota (total vCPUs) often starts at 8-24 per region for new projects as of early 2025, alongside corresponding memory quotas, and boot disks have a minimum size of 10 GB.[36] These quotas are visible and adjustable via the Google Cloud console, where users can request increases through a form-based process, typically approved based on usage history and justification to accommodate scaling needs.[37]Infrastructure and Locations
Regions and Zones
Google Compute Engine organizes its infrastructure into regions and zones to provide geographical distribution, fault tolerance, and compliance options for deployments. A region is an independent geographic area, such asus-central1 in Iowa, United States, that spans one or more physical locations and contains multiple zones.[38] Each region operates independently, allowing users to select locations based on specific needs while ensuring resources within a region can communicate with low latency. As of November 2025, Google Cloud operates 42 regions worldwide, with expansions including new facilities in Europe, such as Stockholm, Sweden (europe-north2), and in North America, such as Querétaro, Mexico (northamerica-south1).[39][40][41]
Zones represent isolated locations within a region, designed to enhance fault tolerance by isolating failures such as power outages or network issues to a single zone without affecting others in the same region. For example, the us-central1 region includes zones like us-central1-a, us-central1-b, and us-central1-c, each hosting a subset of the region's capacity. With 127 zones available as of November 2025, users can deploy instances across multiple zones within a region to achieve high availability, as resources in different zones are engineered to be failure-independent.[38][39]
Selecting regions and zones involves evaluating factors like latency, regulatory compliance, and service availability to optimize performance and meet legal requirements. For instance, to minimize latency for users in Europe, one might choose the europe-west1 region in Belgium, while data residency rules such as the EU's GDPR may necessitate deploying in European regions to keep personal data within the continent. Availability considerations include checking zone-specific quotas and maintenance schedules to ensure uninterrupted operations. Multi-regional resources, such as replicated storage buckets, enable global replication across multiple regions for enhanced durability and accessibility, though their use ties into broader resource scoping policies.[42][43]
Resource Scopes and Placement Policies
In Google Compute Engine, resources are organized into scopes that determine their availability and accessibility across the infrastructure. Zonal resources, such as virtual machine instances, are confined to a single zone within a region and can only interact with other resources in that same zone. Regional resources, including managed instance groups (MIGs), span multiple zones within a single region, enabling broader distribution for improved fault tolerance. Global resources, like custom images and snapshots, are accessible across all regions and zones, facilitating reuse without location-specific constraints. Placement policies in Compute Engine allow users to control the physical distribution of virtual machines to optimize for reliability, performance, or latency. The compact placement policy groups instances closely together on the same underlying hardware or within the same network topology, reducing inter-instance communication latency, which is particularly useful for tightly coupled workloads like high-performance computing applications. In contrast, the spread placement policy distributes instances across distinct hardware to minimize the risk of correlated failures from hardware or zonal outages, enhancing overall availability for mission-critical services. The default "any" policy imposes no specific constraints, allowing the system to place instances based on availability. These placement policies effectively implement affinity and anti-affinity principles for instance placement. Compact policies enforce affinity by co-locating instances to promote low-latency interactions, while spread policies apply anti-affinity by separating them to avoid single points of failure, thereby supporting strategies for high availability without requiring custom scripting. At a higher level, Compute Engine resources are managed within a hierarchical structure that aligns with Google Cloud's overall organization. Projects serve as the primary containers for resources, where all Compute Engine instances, disks, and networks are created and billed. Folders provide optional intermediate grouping for projects, enabling structured organization by department or environment, while the organization node at the top represents the root for an entire enterprise, enforcing policies and access controls across the hierarchy. This structure ensures isolated, scalable management of resources while inheriting permissions downward. A recent enhancement to regional MIGs, introduced in public preview as of November 2025, allows automatic repair of failed virtual machines in an alternate zone within the same region when the primary zone is unavailable. This feature requires enabling update-on-repair and helps maintain instance group health during zonal disruptions, further bolstering availability without manual intervention.[21]Compute Resources
Machine Types
Google Compute Engine offers a variety of predefined machine type families tailored to different workload requirements, balancing vCPU, memory, and other resources for optimal performance and cost-efficiency. These families include general-purpose, compute-optimized, memory-optimized, accelerator-optimized, and storage-optimized types, each with specific series designed for common use cases such as web serving, high-performance computing, in-memory databases, machine learning inference, and high-I/O data processing. Machine types determine the vCPU-to-memory ratios, networking bandwidth, and other capabilities, allowing users to select configurations that align with their application's needs without custom modifications. The general-purpose machine family, suitable for versatile workloads like web servers, containerized applications, and development environments, encompasses the N1, N2, and N4 series. The N1 series, an earlier generation, supports up to 96 vCPUs with a memory ratio of 6.5 GB per vCPU and networking bandwidth up to 32 Gbps, providing balanced performance for standard tasks. The N2 series, powered by Intel Cascade Lake processors (with Ice Lake for instances over 80 vCPUs), scales to 128 vCPUs at 8 GB memory per vCPU and up to 32 Gbps networking, offering improved price-performance for medium-scale applications. The N4 series extends this with up to 80 vCPUs at 8 GB per vCPU and 50 Gbps networking, while the N4D variant, based on AMD EPYC Turin processors, reaches 96 vCPUs with the same memory ratio and became generally available in November 2025 for enhanced flexibility in general workloads.[44][45] Compute-optimized machine types, such as the C2 and C3 series, prioritize high-frequency CPUs for demanding tasks including high-performance computing (HPC), batch processing, and game servers. The C2 series delivers up to 60 vCPUs with 4 GB memory per vCPU and sustained all-core turbo frequencies up to 3.8 GHz, paired with up to 32 Gbps networking for compute-intensive operations. The C3 series advances this capability to 176 vCPUs at 8 GB per vCPU, supporting even larger-scale HPC and AI training workloads with networking bandwidth up to 100 Gbps.[46] Memory-optimized types like the M1 and M2 series are engineered for applications requiring substantial RAM, such as in-memory databases, caching layers, and SAP HANA deployments. The M1 series accommodates up to 160 vCPUs with up to 24 GB memory per vCPU (totaling over 3.8 TB), and networking up to 32 Gbps to handle data-heavy queries efficiently. The M2 series focuses on ultra-high memory configurations, supporting 208–416 vCPUs with as much as 12 TB total memory (approximately 28 GB per vCPU in larger instances), ideal for analytics and real-time processing with the same networking bandwidth.[47] Accelerator-optimized machine types, including the A2, A3, and G2 series, integrate GPUs for graphics rendering, machine learning inference, and generative AI tasks. The A2 series pairs up to 96 vCPUs with 16 NVIDIA A100 GPUs and up to 100 Gbps networking, optimized for large-scale ML training. The A3 series scales to 224 vCPUs with 8 NVIDIA H100 GPUs and exceptional 3,200 Gbps networking, targeting advanced AI workloads. The G2 series, featuring NVIDIA L4 GPUs, supports up to 96 vCPUs with 8 GPUs per instance and 100 Gbps networking, particularly suited for graphics-intensive applications like remote visualization and video processing. Storage-optimized machine types, represented by the Z3 series, cater to high-I/O workloads such as SQL/NoSQL databases, data analytics, and vector databases requiring rapid local storage access. These instances provide up to 176 vCPUs with 36 TiB of local SSD storage and networking bandwidth up to 100 Gbps, enabling low-latency data throughput for scale-out storage systems.| Machine Family | Key Series | vCPU Range | Memory Ratio (GB/vCPU) | Max Networking Bandwidth | Primary Use Cases |
|---|---|---|---|---|---|
| General-purpose | N1, N2, N4/N4D | Up to 128 | 6.5–8 | 32–50 Gbps | Web servers, microservices |
| Compute-optimized | C2, C3 | Up to 176 | 4–8 | Up to 100 Gbps | HPC, AI/ML batch jobs |
| Memory-optimized | M1, M2 | Up to 416 | 14–28 | Up to 32 Gbps | In-memory databases |
| Accelerator-optimized | A2, A3, G2 | Up to 224 | Varies (8–16 base) | 100–3,200 Gbps | ML training, graphics |
| Storage-optimized | Z3 | Up to 176 | Varies | Up to 100 Gbps | High-I/O databases, analytics |
Custom Configurations
Google Compute Engine allows users to create custom machine types that enable precise specification of virtual CPUs (vCPUs) and memory to match specific workload requirements, offering greater flexibility than predefined machine types.[48] For example, a user can configure an instance with exactly 10 vCPUs and 60 GB of memory using the formatcustom-10-61440 (where memory is specified in MB), which is particularly useful for applications needing non-standard resource ratios, such as memory-intensive databases or compute-light services.[48] Memory allocations must be in multiples of 256 MB, and the total configuration must align with the supported machine series, such as N2 or E2.[44]
Constraints on custom machine types ensure compatibility with underlying hardware. In standard configurations, memory per vCPU ranges from 0.9 GB to 6.5 GB, though this varies by series—for instance, N1 series supports 0.922 GB to 6.656 GB per vCPU.[48] Extended memory options, available for series like N4, N4A, N2, and N1, remove the per-vCPU upper limit, allowing up to 8 GB or more per vCPU (e.g., up to 624 GB total for N1), billed at a premium rate to support workloads like large-scale analytics.[44] vCPUs can generally be specified in multiples of 1 starting from 1, except for certain series like E2, which require multiples of 2 up to 32 vCPUs.[44]
Sole-tenant nodes provide dedicated physical hardware isolation for custom machine types, ensuring that VMs run exclusively on servers reserved for a single project to meet compliance or security needs.[49] These nodes support custom configurations in compatible series like N2, where VMs must match the node's machine series but can vary in size within the node's total capacity (e.g., up to 80 vCPUs and 640 GB memory for an n2-standard-80 node).[49]
Custom machine types integrate with accelerators for enhanced performance in AI and high-performance computing. Users can attach NVIDIA GPUs (e.g., A100 or T4) or Google TPUs to custom VMs in supported series like N1 or A2, enabling tailored setups such as a custom-32-225280 instance with four T4 GPUs for machine learning training.
As of November 2025, the Arm-based N4A series, powered by Google's Axion processor on the Arm Neoverse N3 platform, supports custom machine types in preview, offering up to 64 vCPUs and 512 GB of DDR5 memory with extended memory options for cost-effective general-purpose workloads.[50]
Storage Options
Persistent Disks
Persistent Disks are block storage devices in Google Compute Engine that provide durable, high-availability storage independent of virtual machine (VM) instances, allowing data to persist even if the instance is stopped or terminated.[51] They function like physical hard drives but are managed by Google Cloud, offering features such as live attachment and detachment to running VMs without downtime.[51] Google Compute Engine offers several types of Persistent Disks to suit different workloads, balancing cost, performance, and latency requirements. Standard Persistent Disks (pd-standard) use hard disk drives (HDDs) and are optimized for large-scale sequential read/write operations, such as media serving or data analytics, with performance scaling at 0.75 read IOPS and 1.5 write IOPS per GiB of provisioned space, up to a maximum of 7,500 read and 15,000 write IOPS per instance on larger machines.[52] Balanced Persistent Disks (pd-balanced) employ solid-state drives (SSDs) for a cost-effective mix of performance and price, delivering up to 6 IOPS (read and write) per GiB, with a baseline of 3,000 IOPS and maximums reaching 80,000 IOPS per instance, suitable for general-purpose applications like web servers.[52] SSD Persistent Disks (pd-ssd) provide high-performance storage for demanding workloads such as databases, offering up to 30 IOPS (read and write) per GiB and peaking at 100,000 IOPS per instance, with throughput limits of 1,200 MiBps for reads and writes.[52] For workloads requiring predictable latency, Extreme Persistent Disks (pd-extreme) allow provisioning of up to 120,000 IOPS and 4,000 MiBps throughput for reads, ensuring consistent performance without scaling solely on disk size.[53] Across all types, performance scales with disk size and the number of vCPUs in the attached VM instance, but is capped by per-instance limits to prevent overload.[52] Persistent Disks can be sized from a minimum of 10 GB to a maximum of 64 TB per volume, with sizes adjustable in 1 GB increments; for greater capacity, multiple disks can be combined using software RAID configurations within the VM.[54] Up to 128 Persistent Disk volumes (including the boot disk) can be attached to a single VM instance, supporting a total attached capacity of up to 257 TiB, which enables scalable storage setups for complex applications.[51] All Persistent Disks are encrypted at rest by default using Google-managed encryption keys, ensuring data security without additional configuration; alternatively, users can opt for customer-supplied encryption keys (CSEK) to manage their own 256-bit AES keys, providing greater control over encryption for compliance needs, though Google does not store these keys and data becomes inaccessible if they are lost.[55] Data in transit between the disk and VM is also encrypted. For backup, Persistent Disks support incremental snapshots that can be created and managed separately.[51]Local SSD and Hyperdisk
Google Compute Engine offers Local SSD as an ephemeral storage option that provides high-performance, low-latency block storage physically attached to the host machine running the virtual machine instance.[56] This storage uses NVMe or SCSI interfaces and is designed for temporary workloads where data persistence is not required, as all data on Local SSD disks is lost when the instance stops, is preempted, or the host encounters an error.[56] Unlike persistent disks, which maintain data independently of the instance lifecycle, Local SSD emphasizes speed over durability and cannot be detached from the instance or used for snapshots.[56] Local SSD disks come in standard and Titanium variants, with each disk offering 375 GiB of capacity, though Titanium SSD supports up to 6 TiB per disk on certain bare metal configurations.[56] Instances can attach multiple disks, enabling up to 72 TiB of total Local SSD capacity, depending on the machine type and series (e.g., Z3 series allows 12 disks of 6 TiB each).[56] Performance scales with the number of disks and interface; for example, Titanium SSD on NVMe can deliver up to 9,000,000 read IOPS, 6,000,000 write IOPS, 36,000 MiB/s read throughput, and 30,000 MiB/s write throughput.[56] Common use cases include caching, scratch space for high-I/O applications like databases (e.g., temporary tables in SQL Server), and transient data processing in high-performance computing environments.[56] Limitations include incompatibility with shared-core machine types, the inability to add disks after instance creation, and no support for custom encryption keys or data preservation beyond preview features for live migrations.[56] Hyperdisk provides a family of durable, high-performance block storage volumes that can be customized for IOPS and throughput independently of capacity, making it suitable for demanding workloads while maintaining data persistence across instance restarts.[57] Available in several types—Balanced, Balanced High Availability, Extreme, Throughput, and ML—Hyperdisk volumes attach directly to instances like physical disks and support features such as regional replication for high availability and, for the ML variant, sharing across multiple read-only instances with limits varying by volume size (up to 2,500 for volumes ≤256 GiB and lower for larger volumes).[57] The Balanced type offers a general-purpose balance with up to 160,000 IOPS and 2,400 MiB/s throughput; Extreme prioritizes IOPS at up to 350,000 with 5,000 MiB/s throughput; and Throughput focuses on sequential access with up to 2,400 MiB/s at lower IOPS.[57] Hyperdisk ML, optimized for AI and machine learning workloads, delivers the highest performance in the family, with up to 1,200,000 MiB/s throughput and 19,200,000 IOPS, enabling faster model loading and reduced idle time for accelerators in inference and training scenarios.[58] This type supports volumes from 4 GiB to 64 TiB and is particularly useful for immutable datasets like model weights, where multiple instances can access the same volume in read-only mode for large-scale HPC or analytics tasks such as those in Hadoop or Spark.[58] It became generally available in 2024, enhancing support for AI-driven applications.[59] Limitations for Hyperdisk include restrictions on using Extreme, ML, or Throughput types as boot disks, zonal-only availability for ML volumes, and the need to adjust performance settings in increments (e.g., throughput every 6 hours), with attachment limits varying by volume size (up to 2,500 for ≤256 GiB, lower for larger).[57][58]| Hyperdisk Type | Max IOPS | Max Throughput (MiB/s) | Key Focus |
|---|---|---|---|
| Balanced | 160,000 | 2,400 | General-purpose workloads |
| Extreme | 350,000 | 5,000 | High random I/O |
| Throughput | 9,600 | 2,400 | Sequential access |
| ML | 19,200,000 | 1,200,000 | AI/ML data loading |
Networking and Connectivity
Virtual Private Cloud
Google Compute Engine (GCE) utilizes Virtual Private Cloud (VPC) networks as the foundational networking layer, providing isolated, scalable virtual environments for resources like virtual machine (VM) instances. A VPC network acts as a global, virtual version of a physical network, spanning multiple regions without the need for physical cabling, and enables users to define subnets within specific regions for logical segmentation of resources. These networks support auto-mode or custom-mode configurations, where auto-mode automatically creates subnets in every region, while custom-mode allows manual definition of IP ranges and subnet placements to suit workload requirements. IP addressing in GCE VPCs includes internal IPv4 and IPv6 addresses assigned to instances, with support for both dual-stack and IPv6-only configurations to accommodate modern networking needs. External IPv4 addresses can be optionally attached to instances for public internet access, while alias IP ranges enable secondary IP assignments to VMs or load balancers without additional network interfaces. Subnets are associated with primary IP ranges (CIDR blocks) and can include secondary ranges for alias IPs, ensuring efficient address management across regional deployments. IPv6 support, introduced in 2022 and expanded thereafter, allows global anycast addresses for improved scalability in IPv6-enabled workloads.[60] Firewall rules in VPC networks control ingress and egress traffic using distributed, stateful firewalls that apply to all instances within the network. Rules are defined by priority (lower numbers take precedence), direction, and action (allow or deny), with matching based on IP protocols, ports, source/destination IP ranges, and instance tags or service accounts for granular security. For example, a common rule might allow HTTP traffic (port 80) from any source to instances tagged "web-server," while denying all other ingress to minimize exposure. These rules are enforced at the instance level but defined at the VPC level, with a default quota of 1000 rules per project, and hierarchical firewall policies available for enterprise-scale management.[61] Routes in VPC networks direct traffic flow, including default routes for internet-bound traffic and custom static routes for on-premises or peered network connectivity. The system-generated default route (0.0.0.0/0) handles outbound traffic to the internet via Google's edge routers, while custom routes can specify next-hop types such as VM instances, VPN tunnels, or interconnects with metrics to prioritize paths. Route propagation is automatic for connected networks, ensuring dynamic updates without manual intervention in most cases. VPC Network Peering enables secure, low-latency connectivity between multiple VPC networks, either within the same project, across projects, or between different organizations, without requiring gateways or VPNs. Peering connections exchange routes automatically (unless disabled), allowing instances in peered networks to communicate using internal IP addresses as if they were in the same network, which is particularly useful for multi-project architectures or hybrid cloud setups. Limitations include no transitive peering (direct connections only) and non-overlapping IP ranges to prevent conflicts. Shared VPCs extend this capability by allowing centralized network administration across projects, where a host project owns the VPC and subnets are shared with attached projects for resource deployment.Load Balancing and IP Management
Google Cloud Load Balancing provides scalable traffic distribution for Compute Engine instances, supporting various types tailored to different protocols and scopes. The platform offers external Application Load Balancer for HTTP(S) traffic, which operates globally using proxy-based distribution to handle content-based routing and SSL offloading. External passthrough Network Load Balancer supports TCP/SSL/UDP protocols with non-proxied forwarding for low-latency applications, while Internal Application and Passthrough Load Balancers manage intra-VPC traffic for private services. These load balancers integrate with managed instance groups (MIGs) as backend services, enabling automatic distribution of traffic across multiple VM instances for high availability and scalability.[62] IP address management in Compute Engine distinguishes between ephemeral and static external IPs to support reliable external connectivity. Ephemeral external IPs are automatically assigned from Google's pool upon VM creation and released when the instance stops or terminates, making them suitable for temporary workloads but unsuitable for services requiring persistent addressing. Static external IPs can be reserved in advance or promoted from an existing ephemeral IP, ensuring consistent public access for DNS records or external integrations, with options for regional or global scopes. Reservations allow pre-allocation without attachment to a specific instance, facilitating flexible assignment across projects or regions.[63][64] Global load balancing leverages anycast IP addressing to route traffic to the nearest healthy backend across worldwide regions, minimizing latency for multi-region deployments. A single anycast IP serves as the frontend, with Google's edge network directing packets based on proximity and backend health, supporting both premium and standard network tiers for optimized performance. This approach enables seamless failover and content delivery integration, such as with Cloud CDN, for applications spanning multiple zones or regions.[65][66] Autoscaling integrates with load balancing through backend services and health checks to dynamically adjust instance counts based on traffic demands. Health checks probe instance groups at configurable intervals to verify responsiveness, removing unhealthy backends from load distribution and triggering autoscaling policies. Autoscalers can base decisions on load balancing capacity metrics, such as serving capacity or HTTP request rates, ensuring resources scale in tandem with incoming traffic while integrating with MIGs for rolling updates.[67][68] Recent enhancements to global load balancing include traffic isolation policies, introduced in May 2025, which route requests preferentially to the closest region for multi-region applications, reducing latency in preview mode. Additionally, failover capabilities for global external Application Load Balancers, reaching general availability in November 2024, provide regional backup backends for improved resilience in distributed setups. These updates build on prior optimizations like service load balancing policies from July 2024, enhancing multi-region traffic management.[69]Images and Snapshots
Operating System Images
Google Compute Engine provides a variety of preconfigured public operating system (OS) images that users can select to boot virtual machine (VM) instances, ensuring compatibility with Google's infrastructure. These images include popular Linux distributions and Windows Server editions, all optimized for cloud workloads with built-in support for features like automatic security updates and IPv6 networking. Public images are maintained by Google or partners and are available at no additional licensing cost for most Linux variants, while Windows images incur on-demand licensing fees.[70][71] Among the supported Linux public images are Debian (versions 13, 12, and 11), Ubuntu LTS (such as 24.04 and 22.04), and CentOS Stream (10 and 9), each with default disk sizes ranging from 10 GB to 20 GB and configurations that disable root password login for enhanced security. For Windows, public images encompass Server 2022, 2019, 2016, and the 2025 edition, which achieved general availability in late 2024 and supports extended security updates until November 2034; these images enable automatic updates and integration with Google Cloud tools like the guest environment for metadata access. Users can list and select these images via the Google Cloud console or gcloud CLI commands, with regular patches applied for critical vulnerabilities.[71][70][72][73] In addition to public images, users can create custom images to tailor environments with specific software or configurations. Custom images are generated from existing boot disks, snapshots, or imported virtual disks stored in Cloud Storage, allowing for the pre-installation of applications before launching instances. This approach supports scenarios like migrating on-premises workloads or standardizing VM setups across deployments. To manage version updates efficiently, Google Compute Engine uses image families, which are logical groupings of related images within projects likedebian-cloud or ubuntu-os-cloud. For instance, the debian-11 family always references the latest non-deprecated Debian 11 image, enabling rolling updates without manual intervention; if issues arise, administrators can deprecate the current image to revert to a prior stable version. This mechanism ensures access to the most recent stable releases while avoiding end-of-life versions.[70]
Deprecated images, such as those for Windows Server 2012 and Ubuntu 20.04, enter an end-of-support phase where Google ceases updates and eventually deletes them from public availability, prompting users to migrate to supported alternatives. During deprecation, image families automatically exclude these versions, and existing VMs can continue running but without security patches or compatibility guarantees; extended paid support may be available for select OSes like Windows via Microsoft's programs.[74][71][75]
Images in Google Compute Engine operate on a global scope, allowing them to be shared seamlessly across projects and regions without duplication. Public images are inherently accessible project-wide, while custom images can be exported to Cloud Storage or granted permissions via IAM policies for use in other projects, facilitating consistent deployments in multi-project environments.[70]
Snapshots and Data Backup
Google Compute Engine provides disk snapshots as a mechanism for backing up data from persistent disks and Hyperdisks. These snapshots capture the contents of a disk at a specific point in time and serve as incremental backups, storing only the data that has changed since the previous snapshot to optimize storage efficiency. All disk snapshots are encrypted at rest using Google-managed keys, and they are stored in Google Cloud Storage, with options for multi-regional or regional locations to ensure durability and availability.[76][77] Disk snapshots can be created manually through the Google Cloud console, gcloud CLI, or REST APIs, allowing users to initiate backups on demand. Automated creation is supported via snapshot schedules, which enable periodic backups at user-defined intervals, such as daily or weekly, configurable through the APIs. Retention policies can be applied to manage snapshot lifecycle, with standard snapshots suitable for short- to medium-term retention and archive snapshots designed for long-term storage at lower costs; snapshots persist independently of the source disk and can be retained indefinitely until manually deleted.[77][78] To restore data from a disk snapshot, users create a new persistent disk or Hyperdisk from the snapshot, which must be at least as large as the original source disk. This new disk can then be attached to a running or new virtual machine instance using the console, gcloud commands, or APIs, after which the file system is mounted to access the restored data. The restoration process supports both zonal and regional disks, enabling quick recovery without downtime for the original instance.[79] In addition to disk-level backups, Google Compute Engine offers machine image snapshots, which provide full backups of an entire virtual machine instance. A machine image captures the instance's configuration, metadata, permissions, operating system, and data from all attached disks in a crash-consistent manner, using differential snapshots for subsequent images to store only changes from prior versions. These are particularly useful for cloning instances, troubleshooting, or replicating environments across projects.[80] Disk and machine image snapshots support global scope, allowing them to be created and restored in any region or zone within the project. Standard and archive snapshots are automatically replicated across multiple regions for high durability (up to 99.999999999% over a year), facilitating disaster recovery by enabling quick restoration in a secondary region during outages.[76][80]Features and Capabilities
Performance Optimizations
Google Compute Engine offers various CPU platforms to optimize virtual machine (VM) performance based on workload requirements, supporting Intel Xeon processors such as Granite Rapids and Sapphire Rapids for general-purpose and compute-intensive tasks, AMD EPYC processors including Turin and Genoa for cost-effective scale-out applications, and Arm-based processors like Google Axion and NVIDIA Grace for energy-efficient AI and cloud-native workloads.[31] These platforms enable users to select machine series tailored to specific architectures, with Intel providing advanced vector extensions like AVX512 for high-performance computing, AMD offering strong multi-threaded performance for databases, and Arm delivering up to 50% better price-performance in certain inference scenarios.[32] Transparent maintenance in Compute Engine is achieved through automatic live migration, which seamlessly transfers running VMs to healthy hosts during infrastructure events like hardware repairs or software updates without downtime, reboot, or changes to instance configurations such as IP addresses or attached storage.[81] This process involves a brief blackout period of under one second, during which the VM's memory state is copied to the target host, ensuring high availability for most workloads while excluding specialized setups like those with attached GPUs or large local SSDs.[81] Disk performance can be enhanced by tuning IOPS and throughput provisions, particularly with Hyperdisk volumes that allow dynamic adjustments every four to six hours without detaching the disk, enabling workloads to scale from baseline levels up to 350,000 IOPS for Hyperdisk Extreme or 1,200,000 MiB/s throughput for Hyperdisk ML.[82] For read-heavy applications, Hyperdisk supports asynchronous replication to create read replicas in a secondary region, providing low-latency access to duplicated data while maintaining primary write performance.[57] Optimization techniques include aligning application I/O patterns with provisioned limits and using tools likefio for benchmarking to avoid shared resource contention across multiple disks.[83]
Acceleration for machine learning workloads is facilitated by attaching GPUs or TPUs to VMs, with GPU-enabled machine types like the A4 series integrating up to eight NVIDIA B200 GPUs for training large models and the G2 series supporting up to eight L4 GPUs for efficient inference.[84] TPUs, optimized for tensor operations, can be deployed as TPU VMs directly connected to Compute Engine instances for hybrid setups, accelerating frameworks like TensorFlow with up to 10x peak performance gains over previous generations in training and serving tasks.[85] These attachments require compatible machine types and zones, ensuring seamless integration for data processing and graphics-intensive applications.[86]
As of November 2025, the N4D machine series, powered by AMD EPYC Turin processors and featuring up to 768 GB of DDR5 memory, delivers up to 3.5x better price-performance for web-serving workloads, 50% higher performance for general computing, and 70% higher for Java workloads compared to the prior N2D series, enhancing VM efficiency for web serving and Java-based workloads.[45] This update supports custom machine types with up to 96 vCPUs and a 4.1 GHz max-boost frequency, providing substantial gains in memory-bound scenarios without altering existing migration policies.[21]
Management and Automation Tools
Google Compute Engine provides several tools for orchestrating, monitoring, and automating the lifecycle of virtual machine instances, enabling efficient management at scale. Cloud Deployment Manager is an infrastructure-as-code service that automates the creation, updating, and deletion of Compute Engine resources through declarative configuration files written in YAML or Python templates. It supports deploying instance groups, networks, and disks by leveraging underlying Compute Engine APIs, allowing users to version and reuse infrastructure definitions for consistent environments.[87] Note that Cloud Deployment Manager will reach end of support on March 31, 2026, with recommendations to migrate to alternatives like Infrastructure Manager.[88] For broader orchestration, Google Compute Engine integrates seamlessly with Terraform, an open-source infrastructure-as-code tool from HashiCorp. Users can provision and manage Compute Engine instances, such as virtual machines and autoscalers, using Terraform's declarative language and the official Google provider, which translates configurations into API calls for creating resources likegoogle_compute_instance.[89] This integration supports complex setups, including state management and dependency handling, to ensure reproducible deployments across projects.[90]
Monitoring and logging capabilities are essential for maintaining Compute Engine operations, with Cloud Monitoring collecting metrics such as CPU utilization, disk I/O, and network traffic from virtual machine instances via the Ops Agent.[91] Users can create dashboards, set alerting policies based on thresholds, and visualize performance trends to proactively address issues.[92] Complementing this, Cloud Logging aggregates and analyzes logs from Compute Engine VMs, including system events and application outputs, enabling real-time search, filtering, and export for compliance and troubleshooting.[93] These tools integrate with other Google Cloud services to provide unified observability across hybrid environments.[94]
Autoscaling in Compute Engine is handled through managed instance groups (MIGs), which automatically adjust the number of virtual machine instances based on predefined policies tied to metrics like CPU usage, memory consumption, or custom signals from Cloud Monitoring.[95] For example, a policy can scale out by adding instances when average CPU exceeds 60% and scale in when it drops below 40%, ensuring resource elasticity without manual intervention.[96] This feature supports both zonal and regional MIGs, with options for predictive scaling based on historical load patterns to minimize latency during traffic spikes.[97]
Operating system management is streamlined via OS Config, a service within VM Manager that automates patching, compliance reporting, and configuration enforcement on Compute Engine instances.[98] It applies OS updates using native mechanisms for supported images like Debian, RHEL, and Windows, with scheduling options to patch during maintenance windows and assess compliance against baselines such as CIS benchmarks.[99] The OS Config agent, installed on VMs, reports patch status and vulnerabilities, enabling fleet-wide remediation without downtime risks.[100]
Automation is further enhanced through the gcloud CLI, part of the Google Cloud SDK, which provides scripting-friendly commands for managing Compute Engine resources programmatically. Commands like gcloud compute instances create and gcloud compute instance-groups managed list-instances support batch operations, filtering, and output formatting in JSON or YAML for integration into CI/CD pipelines or shell scripts.[101] As of November 10, 2025, observability fields for reservations became generally available, allowing users to query via API or CLI which reservations a VM consumes and list VMs attached to specific reservations, improving visibility into committed resource utilization.[21]