Spot and Preemptible Instances Guide 2025: Up to 90% Savings with the Right Architecture
What Are Spot Instances?
Spot instances use spare cloud provider capacity at a significant discount. The trade-off: the cloud provider can reclaim the instance with 2 minutes notice (AWS Spot) or 30 seconds (GCP Preemptible). This interruption risk limits use cases but the savings are dramatic.
Spot Pricing Comparison
| Provider | Name | Typical Discount | Notice Period | Max Duration |
|---|---|---|---|---|
| AWS | Spot Instances | 60-90% | 2 minutes | Unlimited |
| Azure | Spot Virtual Machines | 60-80% | 30 seconds | Unlimited |
| GCP | Spot VMs / Preemptible | 60-91% | 30 seconds | 24 hours max |
| OCI | Preemptible Instances | ~50% | 1 minute | Unlimited |
Real Pricing Example — m7i.2xlarge
| Purchase Model | Price/hr | Monthly (730hr) | vs On-Demand |
|---|---|---|---|
| On-Demand | $0.4032 | $294 | Baseline |
| Spot (avg 70% disc) | ~$0.121 | ~$88 | -70% |
| 3yr RI (All Upfront) | $0.161 | $118 | -60% |
Ideal Spot Workloads
- Batch data processing: ETL jobs, Spark/EMR, Glue — restart from checkpoint on interruption
- CI/CD pipelines: Build agents on Spot — job fails and retries if interrupted
- Machine learning training: Checkpoint models every 10 minutes — restart from checkpoint
- Stateless web servers in auto-scaling groups: Maintain minimum On-Demand + scale with Spot
- Development and test environments: Interruption tolerance acceptable
- Video encoding / rendering: Batch jobs that can restart
Workloads NOT Suitable for Spot
- Primary production databases
- Real-time transaction processing
- Long-running stateful services without checkpointing
- ZooKeeper, etcd quorum nodes (losing quorum is catastrophic)
AWS Spot Best Practices
Instance Diversification
Don't request a single instance type — request 3-5 similar instances in your Spot fleet. If m7i.xlarge pool dries up, fall back to m7i-flex.xlarge or m6i.xlarge. More diversity = lower interruption probability.
Capacity-Optimised Allocation Strategy
AWS Spot Fleet allocation strategies:
- capacity-optimised: AWS selects from the deepest Spot pool — lowest interruption rate
- price-capacity-optimised: Balances price and interruption risk — recommended for most use cases
- lowest-price: Cheapest but highest interruption rate — only for tolerant batch jobs
GCP Spot VM Specifics
GCP Spot VMs (formerly Preemptible) are capped at 24 hours maximum runtime. At 24 hours, GCP will stop the VM regardless. Account for this in job design — jobs longer than 20 hours should checkpoint and restart. GCP provides 30 seconds notice via ACPI signal before termination.
Spot + On-Demand Hybrid Pattern
Best practice for production workloads: run a minimum On-Demand baseline (20% of capacity) to ensure service continuity, fill remaining capacity with Spot for significant savings.
| Configuration | Monthly Cost (50 instances) |
|---|---|
| 100% On-Demand | $14,700 |
| 20% On-Demand + 80% Spot (70% disc) | $6,468 (-56%) |
| 100% 3yr RI | $5,900 (-60%) |
| 20% RI + 80% Spot | $4,656 (-68%) |
For batch, CI/CD, and ML training workloads, Spot should be your default compute choice. Pair with proper checkpointing and you'll achieve On-Demand reliability at 70-90% lower cost.
Ready to Calculate Your Cloud Costs?
Use TCOIQ's free comparison tool or build a full inventory across all 5 clouds.