Reduce Idle Costs
Idle cost is the portion of your infrastructure spend that produces no useful work. In Kubernetes, it appears in two distinct forms — overprovisioned capacity (pods requesting far more than they use) and unallocated capacity (node resources that no pod has claimed at all).
CostPilot measures both and gives you the data to act on them. This guide walks through identifying which type you have, the right remediation strategy for each, and how to track progress over time.
Understanding the two types of idle cost
| Type | Cause | Where it shows |
|---|---|---|
| Overprovisioned | Pod requests exceed actual usage | Efficiency score below B; right-sizing insights |
| Unallocated | Node capacity with no pod scheduled | Idle cost breakdown on the Dashboard |
Overprovisioned cost is attributable — you know which team or workload is responsible. Unallocated cost belongs to the cluster as a whole and must be tackled at the node pool level.
Step 1: Identify which type you have
Open the Dashboard and look at the Idle cost breakdown card. It shows:
- Overprovisioned idle — requests minus usage, summed across all pods
- Unallocated idle — node capacity minus total pod requests
Most clusters have both types. Start with whichever is larger. In newly-provisioned clusters, unallocated idle tends to dominate. In mature clusters with legacy workloads, overprovisioned idle is usually the bigger issue.
Step 2: Tackle overprovisioned idle — right-size requests
For overprovisioned idle, the fix is reducing pod resource requests to match actual usage.
- Navigate to Cost Explorer → Workload, sorted by Efficiency (ascending).
- Open a low-efficiency workload and read the Insight recommendations.
- Update
resources.requestsin your deployment manifest to match the recommended values (P95 usage × 1.15). - Roll out and monitor for OOMKill or CPU throttling.
Full details are covered in Right-size Workloads.
Realistic savings target: A cluster with average efficiency of 40% (grade E) can typically reach 70–80% (grade B–C) with one round of right-sizing. This translates to a 30–50% reduction in overprovisioned idle cost, though actual node cost savings depend on whether right-sizing allows nodes to be removed.
Step 3: Tackle unallocated idle — right-size your node pool
Unallocated idle means you are paying for nodes that have spare capacity no pod is using. The remediation depends on your setup.
Option A — Enable Cluster Autoscaler (recommended)
If your cloud provider supports it, Cluster Autoscaler removes underutilised nodes automatically.
# Example: GKE node pool with autoscaling
gcloud container clusters update <cluster-name> \
--enable-autoscaling \
--min-nodes=2 \
--max-nodes=10 \
--node-pool=<pool-name>
With autoscaling enabled, CostPilot will show unallocated idle falling over the following days as the autoscaler removes spare nodes.
Set --min-nodes conservatively — at least enough to handle your overnight / off-peak baseline. Autoscaler cannot remove the last node in a pool, so a min of 1 is safe but means one node always runs.
Option B — Manually reduce node pool size
If autoscaling is not available or you prefer manual control:
- In Cost Explorer, check Node dimension to see per-node utilisation.
- Identify nodes with consistently low allocation (below 30% of capacity).
- Cordon and drain those nodes, then remove them from the pool.
kubectl cordon <node-name>
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# Then delete the node from your cloud provider console or CLI
Realistic savings target: Removing one underutilised node on a standard 4-core instance (e.g. AWS m5.xlarge at ~£120/month) saves that full amount. On a cluster running 10 nodes at 25% average allocation, it is common to reduce to 6–7 nodes — a 30–40% infrastructure saving.
Step 4: Use spot instances for variable workloads
For workloads that can tolerate interruption, spot/preemptible instances dramatically reduce node costs:
| Provider | Typical discount | Interruption notice |
|---|---|---|
| AWS Spot | 60–90% cheaper | 2 minutes |
| GCP Preemptible | ~70% cheaper | 30 seconds |
| Azure Spot | ~80% cheaper | 30 seconds |
Suitable workloads for spot nodes:
- Batch jobs and data pipelines
- CI/CD runners
- Development and staging environments
- Stateless, horizontally-scaled services with multiple replicas
Move these workloads to a dedicated spot node pool using node selectors or taints:
# Toleration for spot nodes
tolerations:
- key: "cloud.google.com/gke-spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
nodeSelector:
cloud.google.com/gke-spot: "true"
CostPilot will show the lower spot pricing in Cost Explorer under the Pricing type dimension, making the saving visible.
Do not run stateful workloads (databases, Kafka brokers, Zookeeper) on spot nodes without a robust failover strategy. The 2-minute eviction window is not enough time for most stateful systems to gracefully hand off data.
Step 5: Track progress
After making changes, return to the Dashboard and watch the Idle cost trend over the following 7–14 days. You should see:
- Overprovisioned idle falling as right-sized pods come online
- Unallocated idle falling as autoscaler removes spare nodes or node pool size is reduced
Set a baseline alert to catch idle cost creeping back up. In Settings → Alerts, create an alert with:
- Type: Percentage change
- Scope: Account-wide
- Threshold: +20% week-over-week
- Channel: Slack or email
This ensures that new workloads deployed with generous requests do not silently erode the savings you have made.
Realistic savings benchmarks
| Cluster maturity | Typical overprovisioned idle | Achievable reduction |
|---|---|---|
| New cluster, defaults | 50–70% of spend | 40–60% with right-sizing |
| 1–2 years old, mixed ownership | 30–50% of spend | 25–40% with right-sizing + labels |
| Actively managed | 10–20% of spend | 5–15% ongoing tuning |
Unallocated idle savings are more binary — each node removed saves its full cost. A cluster running 20% unallocated capacity across 10 nodes can typically remove 2 nodes on the first pass, saving 20% of node cost immediately.