Spot & Preemptible Instances

Cloud providers sell spare compute capacity at heavily discounted prices — AWS calls them Spot Instances, GCP calls them Preemptible VMs (or Spot VMs in newer terminology), and Azure calls them Spot Virtual Machines. In exchange for the discount, the provider can reclaim the capacity with short notice when demand increases.

CostPilot detects instance pricing type automatically and reflects the actual cost of each node in your cost breakdowns — so spot savings are immediately visible.

What spot and preemptible means

Spot instances are spare capacity that cloud providers sell at a fraction of on-demand price. The key characteristics:

Significantly cheaper: discounts of 60–90% compared to equivalent on-demand instances
Interruptible: the provider can terminate the instance with short notice (2 minutes for AWS, 30 seconds for GCP and Azure)
No SLA: spot instances carry no availability guarantee
Capacity-dependent: pricing fluctuates with supply and demand (AWS Spot market); GCP and Azure use fixed discounted rates

Typical discounts by provider

Provider	Instance type	Typical discount vs on-demand
AWS	EC2 Spot Instances	60–90% cheaper
GCP	Preemptible / Spot VMs	~70% cheaper
Azure	Azure Spot Virtual Machines	~80% cheaper

ℹ Note

AWS Spot pricing fluctuates in real time based on the Spot market. The 60–90% range represents typical savings — some instance types and availability zones offer consistent 80%+ discounts, while others are closer to 60% during high-demand periods.

How CostPilot detects pricing type

CostPilot’s agent reads node labels and annotations set by the cloud provider’s node group controller. These labels indicate whether a node is running on spot/preemptible capacity.

Provider	Node label	Spot value
AWS (Karpenter)	`karpenter.sh/capacity-type`	`spot`
AWS (EKS managed group)	`eks.amazonaws.com/capacityType`	`SPOT`
GCP (GKE)	`cloud.google.com/gke-spot`	`true`
Azure (AKS)	`kubernetes.azure.com/scalesetpriority`	`spot`

When CostPilot detects one of these labels on a node, it prices that node’s capacity at the spot rate rather than the on-demand rate. This means your cost data accurately reflects what you are actually paying — not what the same capacity would cost at full price.

✦ Tip

If your cluster uses a custom node provisioner or a managed node group that does not set standard labels, CostPilot may fall back to on-demand pricing for those nodes. Contact support if you believe spot nodes are being priced incorrectly.

How spot costs appear in Cost Explorer

In Cost Explorer, use the Pricing type dimension to break down costs by instance category:

Pricing type	Description
`on-demand`	Standard on-demand instance pricing
`spot`	AWS Spot Instance pricing
`preemptible`	GCP Preemptible / Spot VM pricing
`reserved`	Reserved or committed-use instance pricing (where detected)

This dimension lets you see:

What percentage of your node cost comes from spot capacity
Which namespaces or workloads benefit from spot pricing
Whether spot adoption is growing or shrinking over time

You can combine the Pricing type dimension with Cluster or Namespace filters to understand spot coverage per environment.

Which workloads are suitable for spot

Good candidates for spot

Batch and data processing jobs — can be retried on interruption; cost savings are significant at scale
CI/CD runners — jobs are short-lived and naturally retry-safe
Development and staging environments — interruption is a minor inconvenience, not a production incident
Stateless, horizontally-scaled services — if you run 10 replicas, losing 1–2 to spot interruption is tolerable with correct PodDisruptionBudgets configured
Machine learning training — workloads that checkpoint state can resume after interruption

Poor candidates for spot

Databases and stateful stores — data loss or corruption risk on sudden termination
Single-replica critical services — no redundancy means spot interruption causes an outage
Long-running sessions — WebSocket connections and streaming workloads are severed on termination
Workloads without graceful shutdown handling — if the app does not handle SIGTERM correctly, 2 minutes is insufficient

Handling spot interruptions in Kubernetes

Configure your workloads to handle spot interruptions gracefully:

# Ensure pods spread across zones to survive zone-level spot reclamation
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: my-service
          topologyKey: topology.kubernetes.io/zone

# Protect against too many simultaneous evictions
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-service-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-service

For AWS, the Node Termination Handler (a DaemonSet) watches the EC2 metadata API for spot interruption notices and cordons the node before it is terminated, giving pods time to be rescheduled.

Cost optimisation strategy

A common pattern is a mixed node pool architecture:

A small on-demand base node pool sized for your minimum stable workload (guaranteed availability for critical services)
A larger spot node pool with taints for all interruptible workloads

# Deployment tolerating spot nodes (AWS example)
tolerations:
  - key: "eks.amazonaws.com/capacityType"
    operator: "Equal"
    value: "SPOT"
    effect: "NoSchedule"
nodeSelector:
  eks.amazonaws.com/capacityType: SPOT

In CostPilot, the Pricing type dimension shows you the cost split between on-demand and spot. A healthy mixed cluster typically achieves 40–70% of its node cost on spot capacity, delivering substantial savings on the variable portion of the workload.