Right-size Workloads

Over-provisioned resource requests are the most common source of Kubernetes waste. A pod that requests 2 CPU but consistently uses 0.2 CPU pays for ten times more compute than it needs — and that cost compounds across every replica and every namespace.

This guide walks you through identifying over-provisioned workloads, interpreting CostPilot’s recommendations, calculating correct request values, and applying changes safely.

Prerequisites

  • CostPilot agent installed with at least 48 hours of metrics collected
  • Insights generated (triggered automatically at 5,000 metrics; check the Insights tab)
  • kubectl access to apply resource changes

Step 1: Find inefficient workloads

  1. Navigate to Cost Explorer and set the Group by dimension to Workload.
  2. Sort the table by the Efficiency column in ascending order (lowest efficiency first).
  3. Look for workloads with an efficiency grade of D, E, or F.

CostPilot uses a weighted efficiency score based on actual CPU and memory usage relative to requests:

GradeEfficiency rangeInterpretation
A90–100%Well-tuned — requests closely match usage
B75–89%Acceptable — minor overprovisioning
C60–74%Moderate waste — worth investigating
D45–59%Significant overprovisioning — action recommended
E30–44%Severe waste — prioritise for right-sizing
F0–29%Critical waste — very likely misconfigured
Tip

Focus on high-cost, low-efficiency workloads first. A grade F workload that costs £5/month is less impactful than a grade D workload costing £500/month. Use the Cost column alongside Efficiency to prioritise.


Step 2: Read the Insights recommendations

Navigate to the Insights tab. CostPilot automatically generates right-sizing recommendations for over-provisioned workloads.

Each insight includes:

  • Current requests: What the pod is currently requesting
  • Observed P95 usage: The 95th percentile of actual usage over the analysis window
  • Recommended requests: The suggested new values (P95 × safety margin)
  • Estimated monthly saving: How much you would save if you applied the recommendation
Note

Insights are generated after every 5,000 metrics collected from your cluster. On a busy cluster this may happen multiple times per day. On a quieter cluster, allow a few hours after installation before expecting recommendations.


Step 3: Calculate correct requests

CostPilot’s recommended values use the following formula:

Recommended CPU request    = P95 CPU usage  × 1.15
Recommended memory request = P95 memory usage × 1.15

The 1.15 multiplier provides a 15% safety headroom above the observed peak. This is a sensible default for most stateless workloads.

You can override the CPU-to-memory cost ratio per cluster in Settings → Clusters (select the cluster, then edit pricing configuration) if your workloads have an unusual CPU-to-memory profile.

Target ratios by workload type

Workload typeCPU headroomMemory headroomNotes
Stateless API / web×1.15×1.20Memory leaks are common — add extra headroom
Background workers×1.10×1.15Lower burst headroom acceptable
Batch / jobs×1.05×1.10Jobs are short-lived; tighter is fine
ML / data processing×1.25×1.30High variance; more headroom needed
Databases (in-cluster)Do not right-size×1.50CPU is bursty; memory limits risk OOMKill
Warning

Do not right-size databases or stateful services by reducing memory requests aggressively. An OOMKill on a database pod causes data-consistency risks. For these workloads, focus on CPU right-sizing only, and treat memory as a safety margin.


Step 4: Apply changes

Update resource requests in your workload manifests or Helm values, then roll out the change.

# Before
resources:
  requests:
    cpu: "2000m"
    memory: "2Gi"
  limits:
    cpu: "4000m"
    memory: "4Gi"

# After (example — based on P95 usage of 180m CPU, 420Mi memory)
resources:
  requests:
    cpu: "210m"      # 180m × 1.15
    memory: "500Mi"  # 420Mi × 1.15, rounded up
  limits:
    cpu: "500m"      # Maintain a reasonable burst limit
    memory: "1Gi"    # Keep limit higher than request for safety

Roll out with a controlled strategy:

# Apply and monitor rollout
kubectl apply -f deployment.yaml
kubectl rollout status deployment/<name> -n <namespace>

VPA recommendation mode

If you are using the Vertical Pod Autoscaler (VPA), CostPilot’s recommendations align directly with VPA’s suggested values. You can use CostPilot Insights to validate VPA’s recommendations or to inform the minAllowed and maxAllowed bounds in your VPA policy.

Setting VPA to updateMode: "Off" (recommendation-only mode) lets you review VPA suggestions alongside CostPilot Insights before applying changes — a safe approach for production workloads.


Step 5: Monitor after applying changes

After a rollout, watch for two things:

  1. OOMKill events — indicates memory requests are too low
  2. CPU throttling — indicates CPU limits are too low (requests may be fine but limits need raising)
# Watch for OOMKill
kubectl get events --all-namespaces --field-selector reason=OOMKilling

# Check recent resource pressure events
kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Events:"

Return to CostPilot after 24–48 hours. The efficiency score for the workload should improve. If it does not, check whether the pod is actually using the requests you set or whether a HPA is scaling out additional replicas.


Common pitfalls

  • Limits without requests: Kubernetes derives QoS class from requests, not limits. Always set both.
  • Setting requests = limits: This creates a Guaranteed QoS class, which prevents bursting. Only do this for latency-sensitive workloads where predictability matters more than flexibility.
  • Right-sizing HPA-managed workloads: Right-sizing reduces per-pod cost but the HPA may scale out more pods. Review HPA min/max replica settings alongside request changes.
  • Ignoring memory limits: Reducing memory requests is safe. Reducing memory limits risks OOMKill. Keep limits at 2× the request for most workloads.