Spot & Preemptible Instances
Cloud providers sell spare compute capacity at heavily discounted prices — AWS calls them Spot Instances, GCP calls them Preemptible VMs (or Spot VMs in newer terminology), and Azure calls them Spot Virtual Machines. In exchange for the discount, the provider can reclaim the capacity with short notice when demand increases.
CostPilot detects instance pricing type automatically and reflects the actual cost of each node in your cost breakdowns — so spot savings are immediately visible.
What spot and preemptible means
Spot instances are spare capacity that cloud providers sell at a fraction of on-demand price. The key characteristics:
- Significantly cheaper: discounts of 60–90% compared to equivalent on-demand instances
- Interruptible: the provider can terminate the instance with short notice (2 minutes for AWS, 30 seconds for GCP and Azure)
- No SLA: spot instances carry no availability guarantee
- Capacity-dependent: pricing fluctuates with supply and demand (AWS Spot market); GCP and Azure use fixed discounted rates
Typical discounts by provider
| Provider | Instance type | Typical discount vs on-demand |
|---|---|---|
| AWS | EC2 Spot Instances | 60–90% cheaper |
| GCP | Preemptible / Spot VMs | ~70% cheaper |
| Azure | Azure Spot Virtual Machines | ~80% cheaper |
AWS Spot pricing fluctuates in real time based on the Spot market. The 60–90% range represents typical savings — some instance types and availability zones offer consistent 80%+ discounts, while others are closer to 60% during high-demand periods.
How CostPilot detects pricing type
CostPilot’s agent reads node labels and annotations set by the cloud provider’s node group controller. These labels indicate whether a node is running on spot/preemptible capacity.
| Provider | Node label | Spot value |
|---|---|---|
| AWS (Karpenter) | karpenter.sh/capacity-type | spot |
| AWS (EKS managed group) | eks.amazonaws.com/capacityType | SPOT |
| GCP (GKE) | cloud.google.com/gke-spot | true |
| Azure (AKS) | kubernetes.azure.com/scalesetpriority | spot |
When CostPilot detects one of these labels on a node, it prices that node’s capacity at the spot rate rather than the on-demand rate. This means your cost data accurately reflects what you are actually paying — not what the same capacity would cost at full price.
If your cluster uses a custom node provisioner or a managed node group that does not set standard labels, CostPilot may fall back to on-demand pricing for those nodes. Contact support if you believe spot nodes are being priced incorrectly.
How spot costs appear in Cost Explorer
In Cost Explorer, use the Pricing type dimension to break down costs by instance category:
| Pricing type | Description |
|---|---|
on-demand | Standard on-demand instance pricing |
spot | AWS Spot Instance pricing |
preemptible | GCP Preemptible / Spot VM pricing |
reserved | Reserved or committed-use instance pricing (where detected) |
This dimension lets you see:
- What percentage of your node cost comes from spot capacity
- Which namespaces or workloads benefit from spot pricing
- Whether spot adoption is growing or shrinking over time
You can combine the Pricing type dimension with Cluster or Namespace filters to understand spot coverage per environment.
Which workloads are suitable for spot
Good candidates for spot
- Batch and data processing jobs — can be retried on interruption; cost savings are significant at scale
- CI/CD runners — jobs are short-lived and naturally retry-safe
- Development and staging environments — interruption is a minor inconvenience, not a production incident
- Stateless, horizontally-scaled services — if you run 10 replicas, losing 1–2 to spot interruption is tolerable with correct PodDisruptionBudgets configured
- Machine learning training — workloads that checkpoint state can resume after interruption
Poor candidates for spot
- Databases and stateful stores — data loss or corruption risk on sudden termination
- Single-replica critical services — no redundancy means spot interruption causes an outage
- Long-running sessions — WebSocket connections and streaming workloads are severed on termination
- Workloads without graceful shutdown handling — if the app does not handle SIGTERM correctly, 2 minutes is insufficient
Handling spot interruptions in Kubernetes
Configure your workloads to handle spot interruptions gracefully:
# Ensure pods spread across zones to survive zone-level spot reclamation
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: my-service
topologyKey: topology.kubernetes.io/zone
# Protect against too many simultaneous evictions
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-service-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: my-service
For AWS, the Node Termination Handler (a DaemonSet) watches the EC2 metadata API for spot interruption notices and cordons the node before it is terminated, giving pods time to be rescheduled.
Cost optimisation strategy
A common pattern is a mixed node pool architecture:
- A small on-demand base node pool sized for your minimum stable workload (guaranteed availability for critical services)
- A larger spot node pool with taints for all interruptible workloads
# Deployment tolerating spot nodes (AWS example)
tolerations:
- key: "eks.amazonaws.com/capacityType"
operator: "Equal"
value: "SPOT"
effect: "NoSchedule"
nodeSelector:
eks.amazonaws.com/capacityType: SPOT
In CostPilot, the Pricing type dimension shows you the cost split between on-demand and spot. A healthy mixed cluster typically achieves 40–70% of its node cost on spot capacity, delivering substantial savings on the variable portion of the workload.