Google Cloud Launches A4X VMs with NVIDIA GB200 NVL72 for Hyperscale AI

📅 October 2025⚡ High impact🏷️ launch

📰 The Announcement

Google Cloud's October 2025 launch of A4X virtual machines marks a significant leap in hyperscale AI infrastructure, built on NVIDIA's GB200 NVL72 rack-scale systems. Each A4X node ships with 72 Blackwell B200 GPUs tightly coupled via NVLink, delivering approximately 13.5TB of HBM3e memory per node and a peak FP8 throughput exceeding 1.4 exaFLOPS at the rack level. On-demand pricing is set at approximately $340 per hour for a full NVL72 node, with 1-year Committed Use Discounts (CUDs) bringing that figure down to roughly $221 per hour — a 35% reduction that meaningfully improves economics for sustained training runs. Google is initially rolling out A4X in us-central1 and europe-west4, with asia-northeast1 availability targeted for Q1 2026. Alongside A4X, Google simultaneously announced general availability of TPU v7, giving customers a first-party alternative at comparable price points for transformer-heavy workloads.

Placing A4X in competitive context is critical for procurement decisions. AWS's equivalent P6e instances, powered by NVIDIA GB200 and available in clusters of up to 64 GPUs, are priced at approximately $310–$330 per hour per node on-demand, with 1-year Reserved Instance pricing near $210 per hour — making AWS marginally cheaper on list price but with fewer NVLink-connected GPUs per node. Azure's NDv5 H200 instances (ND H200 v5 series, 8 H200 GPUs per VM) run at roughly $98 per hour on-demand per 8-GPU VM, meaning a comparable 72-GPU cluster on Azure costs approximately $1,200 per hour — substantially more expensive at scale. Oracle Cloud Infrastructure's BM.GPU.B4.8 (H100-based) remains a generation behind at around $32 per hour per 8-GPU bare-metal node. On a price-per-petaflop basis for FP8 inference and training workloads, GCP A4X delivers a compelling ratio, particularly when CUDs are applied, and the NVL72 unified memory fabric reduces inter-GPU communication overhead that fragmented multi-VM setups on competing clouds cannot match.

The launch matters most for three customer segments: large language model pre-training teams running multi-trillion-parameter models, enterprise AI labs scaling reinforcement learning from human feedback (RLHF) pipelines, and hyperscale inference providers serving sub-50ms latency SLAs at massive concurrency. The 13.5TB unified HBM3e pool per node is a game-changer for model parallelism, allowing organizations to fit 405B+ parameter models entirely within a single NVL72 node without tensor sharding across hosts. Competitive pressure is now acute on AWS and Azure to accelerate their own GB200 NVL72 deployments and pricing; expect both providers to respond with aggressive CUD/RI discounts within 60–90 days. Key caveats include limited regional availability at launch, potential for GPU allocation queuing given early supply constraints, and meaningful Terraform and Kubernetes reconfiguration costs when migrating existing A100 or H100 workloads to the new NVLink fabric topology. Vendor lock-in risk is also elevated, as TPU v7 co-availability creates an incentive for Google to bundle pricing in ways that make cross-cloud portability harder to justify financially.

For customers with active large-scale AI training or inference workloads, the immediate action is to model your current per-GPU-hour spend against A4X CUD pricing. Organizations currently running more than 256 A100 or H100 GPUs continuously on any cloud should request A4X quota in us-central1 now, as early allocation windows tend to fill within 30–60 days of a major launch. For teams spending more than $500,000 per month on GPU compute, the 1-year CUD at $221 per hour versus on-demand $340 per hour represents a potential saving of over $1 million annually per NVL72 node at full utilization. Before committing, benchmark your specific model architecture on A4X — particularly attention to FP8 vs BF16 throughput trade-offs — since not all workloads will achieve the theoretical peak FLOP counts in practice. Teams using Kubernetes-based ML platforms should also validate their NCCL and NVLink topology-aware scheduling configurations before signing a CUD, as misconfigured collective communication can negate the hardware advantages entirely.

At TCOIQ, we view the A4X launch as exactly the kind of inflection point where a structured TCO analysis prevents costly over-commitment or missed savings. The TCOIQ TCO Calculator at tcoiq.com/tco.html can model A4X on-demand versus 1-year CUD versus AWS P6e Reserved versus Azure NDv5 side-by-side, incorporating your actual GPU utilization rates, storage egress, and networking costs to produce a true total cost of ownership rather than a headline hourly rate comparison. The Inventory Builder at tcoiq.com/inventory.html allows you to tag and categorize your existing GPU fleet so you can identify which workloads are genuine migration candidates for A4X and which are better served by staying on H100 or moving to TPU v7. Our AI Migration Assessment surfaces hidden costs like data transfer, model re-optimization time, and cluster reconfiguration that vendors never include in their pitch decks. Start by running your current GPU inventory through the TCOIQ Inventory Builder to produce a migration-readiness score before making any CUD commitment.

💰 TCOIQ Cost ImpactOn-demand: ~$340/hr per 72-GPU NVL72 node; 1-year CUD: ~$221/hr — saving ~$1.05M annually per node at full utilization versus on-demand pricing.

📊 Why It Matters · Impact Analysis

The A4X launch most directly benefits AI research labs, large language model pre-training teams, and hyperscale inference providers that require unified high-bandwidth memory at rack scale, enabling 405B+ parameter models to fit within a single NVL72 node without cross-host tensor sharding. Enterprise FinOps teams managing GPU budgets above $500K per month stand to capture over $1 million in annual savings per node by choosing 1-year CUDs over on-demand pricing. Competitive pressure on AWS and Azure is significant; both will likely accelerate their own GB200 NVL72 deployments and sharpen Reserved Instance pricing within 60–90 days. Key downsides include limited launch regions (us-central1 and europe-west4 only through year-end), potential GPU allocation queuing due to early supply constraints, and elevated vendor lock-in risk as Google bundles TPU v7 co-availability to reduce cross-cloud portability incentives.

✅ What You Should Do

Audit your current GPU fleet using TCOIQ Inventory Builder — identify all workloads running 256+ A100 or H100 GPUs continuously as primary A4X migration candidates before quota windows close in the next 30–60 days.
Model A4X 1-year CUD at $221/hr against your current on-demand GPU spend in the TCOIQ TCO Calculator; any team spending over $500K/month on GPU compute should quantify the $1M+ annual saving potential per NVL72 node before committing.
Request A4X quota in us-central1 immediately if your organization runs sustained large-scale LLM pre-training or RLHF pipelines — early allocation requests submitted within the first 45 days of launch have historically received priority provisioning on new GCP hardware generations.
Benchmark your specific model architecture on A4X in FP8 mode versus your current BF16 H100 baseline before signing any CUD — validate that real-world throughput achieves at least 80% of theoretical peak to justify the commitment.
Run a TCOIQ AI Migration Assessment to surface hidden migration costs including data transfer fees, NCCL/NVLink topology reconfiguration, and model re-optimization time — these costs can erode 10–25% of projected savings if unaccounted for in your business case.
Compare A4X CUD pricing against AWS P6e 1-year Reserved ($210/hr) and Azure NDv5 72-GPU cluster equivalents ($1,200+/hr) using the TCOIQ TCO Calculator to confirm GCP's price-per-petaflop advantage for your specific FP8 workload mix before any multi-year commitment.

🎯 TCOIQ Recommendation

TCOIQ's view is that the A4X launch creates a rare but time-limited window to lock in best-in-class price-per-petaflop economics, but only for teams that go into a CUD commitment with rigorous utilization and migration-cost data. The TCOIQ TCO Calculator at tcoiq.com/tco.html enables side-by-side modeling of A4X CUD versus AWS P6e Reserved versus Azure NDv5 with your actual utilization rates and ancillary costs factored in, eliminating the vendor-math bias inherent in cloud provider TCO tools. The Inventory Builder at tcoiq.com/inventory.html lets you tag your existing GPU workloads by migration readiness, while the AI Migration Assessment quantifies hidden reconfiguration and re-optimization costs that can erode 10–25% of projected savings. Start today by uploading your current GPU inventory to the TCOIQ Inventory Builder to generate a migration-readiness score before committing to any A4X CUD.

→ Model this in TCOIQ TCO Calculator

📎 Original source: Introducing A4X VMs with NVIDIA GB200 NVL72 on Google Cloud ↗