Right-size Workloads
Over-provisioned resource requests are the most common source of Kubernetes waste. A pod that requests 2 CPU but consistently uses 0.2 CPU pays for ten times more compute than it needs — and that cost compounds across every replica and every namespace.
This guide walks you through identifying over-provisioned workloads, interpreting CostPilot’s recommendations, calculating correct request values, and applying changes safely.
Prerequisites
- CostPilot agent installed with at least 48 hours of metrics collected
- Insights generated (triggered automatically at 5,000 metrics; check the Insights tab)
kubectlaccess to apply resource changes
Step 1: Find inefficient workloads
- Navigate to Cost Explorer and set the Group by dimension to Workload.
- Sort the table by the Efficiency column in ascending order (lowest efficiency first).
- Look for workloads with an efficiency grade of D, E, or F.
CostPilot uses a weighted efficiency score based on actual CPU and memory usage relative to requests:
| Grade | Efficiency range | Interpretation |
|---|---|---|
| A | 90–100% | Well-tuned — requests closely match usage |
| B | 75–89% | Acceptable — minor overprovisioning |
| C | 60–74% | Moderate waste — worth investigating |
| D | 45–59% | Significant overprovisioning — action recommended |
| E | 30–44% | Severe waste — prioritise for right-sizing |
| F | 0–29% | Critical waste — very likely misconfigured |
Focus on high-cost, low-efficiency workloads first. A grade F workload that costs £5/month is less impactful than a grade D workload costing £500/month. Use the Cost column alongside Efficiency to prioritise.
Step 2: Read the Insights recommendations
Navigate to the Insights tab. CostPilot automatically generates right-sizing recommendations for over-provisioned workloads.
Each insight includes:
- Current requests: What the pod is currently requesting
- Observed P95 usage: The 95th percentile of actual usage over the analysis window
- Recommended requests: The suggested new values (P95 × safety margin)
- Estimated monthly saving: How much you would save if you applied the recommendation
Insights are generated after every 5,000 metrics collected from your cluster. On a busy cluster this may happen multiple times per day. On a quieter cluster, allow a few hours after installation before expecting recommendations.
Step 3: Calculate correct requests
CostPilot’s recommended values use the following formula:
Recommended CPU request = P95 CPU usage × 1.15
Recommended memory request = P95 memory usage × 1.15
The 1.15 multiplier provides a 15% safety headroom above the observed peak. This is a sensible default for most stateless workloads.
You can override the CPU-to-memory cost ratio per cluster in Settings → Clusters (select the cluster, then edit pricing configuration) if your workloads have an unusual CPU-to-memory profile.
Target ratios by workload type
| Workload type | CPU headroom | Memory headroom | Notes |
|---|---|---|---|
| Stateless API / web | ×1.15 | ×1.20 | Memory leaks are common — add extra headroom |
| Background workers | ×1.10 | ×1.15 | Lower burst headroom acceptable |
| Batch / jobs | ×1.05 | ×1.10 | Jobs are short-lived; tighter is fine |
| ML / data processing | ×1.25 | ×1.30 | High variance; more headroom needed |
| Databases (in-cluster) | Do not right-size | ×1.50 | CPU is bursty; memory limits risk OOMKill |
Do not right-size databases or stateful services by reducing memory requests aggressively. An OOMKill on a database pod causes data-consistency risks. For these workloads, focus on CPU right-sizing only, and treat memory as a safety margin.
Step 4: Apply changes
Update resource requests in your workload manifests or Helm values, then roll out the change.
# Before
resources:
requests:
cpu: "2000m"
memory: "2Gi"
limits:
cpu: "4000m"
memory: "4Gi"
# After (example — based on P95 usage of 180m CPU, 420Mi memory)
resources:
requests:
cpu: "210m" # 180m × 1.15
memory: "500Mi" # 420Mi × 1.15, rounded up
limits:
cpu: "500m" # Maintain a reasonable burst limit
memory: "1Gi" # Keep limit higher than request for safety
Roll out with a controlled strategy:
# Apply and monitor rollout
kubectl apply -f deployment.yaml
kubectl rollout status deployment/<name> -n <namespace>
VPA recommendation mode
If you are using the Vertical Pod Autoscaler (VPA), CostPilot’s recommendations align directly with VPA’s suggested values. You can use CostPilot Insights to validate VPA’s recommendations or to inform the minAllowed and maxAllowed bounds in your VPA policy.
Setting VPA to updateMode: "Off" (recommendation-only mode) lets you review VPA suggestions alongside CostPilot Insights before applying changes — a safe approach for production workloads.
Step 5: Monitor after applying changes
After a rollout, watch for two things:
- OOMKill events — indicates memory requests are too low
- CPU throttling — indicates CPU limits are too low (requests may be fine but limits need raising)
# Watch for OOMKill
kubectl get events --all-namespaces --field-selector reason=OOMKilling
# Check recent resource pressure events
kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Events:"
Return to CostPilot after 24–48 hours. The efficiency score for the workload should improve. If it does not, check whether the pod is actually using the requests you set or whether a HPA is scaling out additional replicas.
Common pitfalls
- Limits without requests: Kubernetes derives QoS class from requests, not limits. Always set both.
- Setting requests = limits: This creates a Guaranteed QoS class, which prevents bursting. Only do this for latency-sensitive workloads where predictability matters more than flexibility.
- Right-sizing HPA-managed workloads: Right-sizing reduces per-pod cost but the HPA may scale out more pods. Review HPA min/max replica settings alongside request changes.
- Ignoring memory limits: Reducing memory requests is safe. Reducing memory limits risks OOMKill. Keep limits at 2× the request for most workloads.