🔵 Google Cloud

GCP Sustained Use Discounts Now Apply to GPU Instances — A100 Gets Automatic 30% Discount for Always-On Workloads

📅 February 2026 ✍️ TCOIQ Analysis ⚡ Medium Impact

What are GCP Sustained Use Discounts?

Sustained Use Discounts (SUD) are GCP's automatic pricing mechanism that reduces the effective hourly rate for compute resources that run for a significant portion of the month. No commitment, no reservation, no configuration required — GCP simply calculates how much of the month each resource ran and applies a sliding discount. At 25% of the month: 20% discount. At 50%: 25% discount. At 100% (always-on): 30% discount. Until now, SUD only applied to standard CPU and memory instances, not GPUs.

What Changed?

Google extended Sustained Use Discounts to cover GPU instances including NVIDIA A100 80GB (a2-highgpu series), NVIDIA L4 (g2-standard series), and NVIDIA T4 (n1 + T4 accelerator). The SUD percentages are the same as standard compute: 20-30% depending on monthly utilisation. The discount applies to the GPU portion of the instance cost specifically. An a2-highgpu-1g instance (12 vCPU, 85GB RAM, 1× A100 80GB) running 24/7 at the base GPU price of $2.933/hr now receives a 30% SUD, reducing the effective rate to $2.053/hr.

Why Does This Matter?

For ML teams running persistent GPU workloads — model serving endpoints, inference APIs, or ongoing training pipelines — the 30% automatic saving is significant. An A100 inference server running 24/7 previously cost $2,933/hr effective. Now it costs $2,053/hr effective — $648/month saving per GPU instance, no action required. For a team running 10 A100 inference servers, that is $6,480/month or $77,760/year in automatic savings. Previously, getting any GPU discount on GCP required purchasing a 1-year or 3-year CUD commitment — now variable workloads running most of the month get 20-30% off automatically.

How to Maximise This

SUD is calculated per project, per region, per GPU type. To maximise your SUD, consolidate GPU workloads within the same GCP project and region where possible. If you have GPU inference servers that currently scale down to zero at night, evaluate whether keeping them at minimum capacity (1 GPU, auto-scaling up during peak) earns more in SUD savings than the savings from scaling to zero. At typical inference workloads, always-on 1-GPU at $2.053/hr effective is often cheaper than on-demand 2-GPU at $2.933/hr each for 12 hours due to the SUD benefit on the persistent instance.

Who Should Act Now

GCP teams running GPU instances for ML inference, model serving, or ongoing training should check their billing dashboard — SUD is applied automatically and retroactively within the billing month. No configuration needed. For teams considering CUD purchases for GPU instances: SUD now provides 30% discount for always-on workloads with zero commitment, so only purchase a 3-year GPU CUD if you need guaranteed capacity (not just the discount) — the combined SUD + CUD discount is not additive.

💰 TCOIQ Cost Impact

A100 running 24/7: automatic saving from $2.933/hr to $2.053/hr effective — $648/month per GPU with zero commitment or configuration

📎 Official Source: GCP Sustained Use Discounts Documentation ↗

Calculate Your Actual Saving

Use TCOIQ free tools to model this against your specific workload and infrastructure.

Compare VM Prices → Build Inventory TCO Calculator