Understanding Idle Costs
Idle cost is the portion of your infrastructure spend that isn’t doing useful work. In Kubernetes environments, idle waste comes in two distinct forms with different causes and different solutions.
Type 1 — Overprovisioned resources
What it is: Capacity that a pod has requested but is not using.
When a pod sets requests.cpu: 1000m but only ever uses 100m, 900m of CPU is reserved on the node for that pod but sits idle. You’re paying for 1000m but only getting value from 100m.
This is the most common form of idle waste, typically caused by:
- Cargo-cult defaults — Engineers copying resource requests from other deployments without tuning them
- Fear of OOMKills — Setting very high memory requests “just in case”
- Over-conservative capacity planning — Applications that have shrunk but resource requests haven’t been updated
- Batch workload peaks — Jobs that need resources occasionally but hold them continuously
How to identify it
In CostPilot:
- Open Cost Explorer → select the Workload dimension
- Sort by Efficiency ascending
- The lowest-efficiency workloads have the largest gap between requests and actual usage
CostPilot’s Insights will also flag specific workloads with high overprovisioning, along with the estimated monthly savings from right-sizing.
How to reduce it
- Right-size resource requests — Set CPU and memory requests to approximately P95 of actual usage, plus a small buffer (10–20%)
- Use Vertical Pod Autoscaler (VPA) — VPA in recommendation mode automatically suggests right-sized requests based on observed usage
- Set resource limits — Prevents single workloads from consuming unbounded resources, improving overall cluster density
- Schedule requests reviews — Make resource request tuning a quarterly engineering task
A common target is to get CPU utilisation to 60–70% of requests. This gives sufficient headroom for traffic spikes while avoiding large waste.
Type 2 — Unallocated node capacity
What it is: Node capacity that no pod has claimed.
Even if every pod on a node uses exactly what it requests, nodes are never 100% schedulable — the Kubernetes system reserves capacity for OS processes, kubelet, and system pods. Beyond that, if your node pool has more total capacity than your workloads need, the excess is permanently idle.
Common causes:
- Static node pools — A fixed number of nodes regardless of actual load
- Overprovisioned node types — Using large instance types when smaller ones would suffice
- Cluster autoscaler lag — Nodes provisioned for a peak that has since passed, but not yet scaled down
- Bin-packing inefficiency — Pod scheduling leaves gaps that are too small for any queued pod
How to identify it
In CostPilot:
- Go to Dashboard → look at the Idle cost breakdown
- Hover over the idle cost card to see the split between overprovisioned (Type 1) and unallocated (Type 2) waste
Alternatively, the Node Health insight category specifically reports on nodes with chronically low utilisation.
How to reduce it
- Enable Cluster Autoscaler — Automatically adds nodes when pods are pending and removes them when utilisation is low
- Right-size node pools — Use multiple node pools with different instance types to better match workload profiles (CPU-optimised, memory-optimised)
- Use spot instances — For workloads that can tolerate interruption, spot nodes reduce the cost of idle capacity significantly
- Tune bin-packing — Set Kubernetes scheduler policies to pack pods more tightly onto fewer nodes
- Pod disruption budgets — Allow the autoscaler to evict pods and scale down nodes more aggressively
Comparing the two types
| Overprovisioned | Unallocated | |
|---|---|---|
| Cause | Pod requests > actual usage | Node capacity > pod requests |
| Fix | Right-size resource requests | Right-size or autoscale node pool |
| Tooling | VPA recommendations | Cluster Autoscaler |
| Urgency | High (large cumulative waste) | High (pure unused cost) |
A note on headroom
Some idle cost is intentional and healthy. A cluster with zero idle capacity has no room to absorb traffic spikes or schedule new pods — you’ll hit resource exhaustion under load. A reasonable target is:
- 10–15% CPU headroom above current pod requests
- 15–20% memory headroom above current pod requests (memory is less elastic than CPU)
CostPilot’s efficiency scoring accounts for this — an efficiency of 80–85% is excellent, not a problem.
CostPilot calculates idle cost using a 2-hour rolling average of pod requests vs. node capacity, smoothing out short-term scheduling noise.