QoS Classes & Cost Impact

Kubernetes assigns every pod a Quality of Service (QoS) class based on how its resource requests and limits are configured. This class determines scheduling priority, eviction order — and crucially for CostPilot — which cost allocation method is used.

Understanding QoS classes helps you interpret why some pods appear cheap in CostPilot when they are actually consuming significant resources, and why production workloads should always have resource requests set.


The three QoS classes

Guaranteed

A pod is Guaranteed when every container in the pod has:

  • CPU requests and limits set
  • Memory requests and limits set
  • Requests equal to limits for both resources
resources:
  requests:
    cpu: "500m"
    memory: "256Mi"
  limits:
    cpu: "500m"    # must equal request
    memory: "256Mi" # must equal request

Cost allocation: CostPilot allocates cost based on the pod’s requests (which equal the limits). The allocation is stable and predictable — it does not fluctuate with actual usage.

Eviction order: Guaranteed pods are evicted last. The node evicts BestEffort first, then Burstable, then Guaranteed.

When to use: Latency-sensitive, production-critical workloads where you need predictable performance and are willing to pay for reserved capacity. Databases, payment processing, real-time APIs.


Burstable

A pod is Burstable when at least one container has a request set, but the pod does not meet the Guaranteed criteria (requests ≠ limits, or only some resources have requests).

resources:
  requests:
    cpu: "200m"    # request set
    memory: "128Mi"
  limits:
    cpu: "1000m"   # limit higher than request — allows bursting
    memory: "512Mi"

Cost allocation: CostPilot allocates cost based on the requests that are set. For the resources where no request is set, CostPilot uses actual usage as a fallback.

Eviction order: Burstable pods are evicted after BestEffort but before Guaranteed. The pod with the highest ratio of usage-to-request is evicted first.

When to use: Most general-purpose workloads — stateless services, background workers, web applications where some burst capacity is useful but reserved baseline is still needed.


BestEffort

A pod is BestEffort when no container has any resource requests or limits set:

# No resources block at all, or:
resources: {}

Cost allocation: CostPilot uses actual observed usage for BestEffort pods. This is the only allocation method that reflects true consumption — because there is no request to use as a baseline.

Eviction order: BestEffort pods are evicted first when the node is under memory or CPU pressure.

When to use: Only for non-critical, interruptible workloads where scheduling flexibility is more important than reliability. Development tooling, ephemeral jobs.

Warning

BestEffort pods may appear cheaper in CostPilot than equivalent Burstable pods, because they are only charged for what they actually use. However, they are the most likely to be evicted and the least protected against resource starvation. Do not run production workloads as BestEffort.


How QoS class affects CostPilot allocation

QoS classAllocation basisCost stabilityEfficiency score
GuaranteedRequests (= limits)StableAlways 100% if usage = request
BurstableRequests (where set)Mostly stableVaries based on actual usage vs requests
BestEffortActual usageFluctuatesNot calculated (no requests to compare against)
Note

BestEffort pods do not contribute to efficiency scores because efficiency is defined as actual usage ÷ requests. With no requests, the ratio is undefined. These pods appear in cost breakdowns but are excluded from the efficiency grade calculation.


Why BestEffort looks “cheap” but is risky

A BestEffort pod that idles at 5m CPU usage costs almost nothing in CostPilot. This is accurate — it is only consuming 5m CPU on the node. But consider what happens under load:

  1. The pod starts consuming 2,000m CPU (it has no limit).
  2. The node becomes CPU-pressured.
  3. Kubernetes evicts BestEffort pods first.
  4. Your workload is terminated with no warning.

Meanwhile, the same pod running as Burstable with a 200m request would have shown up as cost in CostPilot, but would have survived the CPU pressure event.

The risk is invisible in cost data — BestEffort cost looks low right up until the pod is evicted. There is no warning signal.


Recommendations

Set requests on all production pods

Even a small request (e.g. cpu: "10m", memory: "32Mi") moves a pod from BestEffort to Burstable. This:

  • Protects the pod from first-pass eviction
  • Gives CostPilot a stable allocation basis
  • Contributes to the efficiency score, surfacing overprovisioning

Use Guaranteed QoS for latency-sensitive workloads

Setting requests = limits eliminates CPU throttling and memory swapping for your most critical services. Accept the higher nominal cost in exchange for predictable performance.

Audit BestEffort pods regularly

In Cost Explorer, filter to the QoS class dimension and look for BestEffort pods in production namespaces. Any BestEffort pod in production or staging is a reliability risk worth addressing.

# Find BestEffort pods in all namespaces
kubectl get pods --all-namespaces -o json \
  | jq -r '.items[] | select(.status.qosClass=="BestEffort") | [.metadata.namespace, .metadata.name] | @tsv'

Align with VPA

If you use the Vertical Pod Autoscaler, it will only manage pods that have an existing resources block. BestEffort pods are outside VPA’s scope. Setting at least minimal requests makes all your pods eligible for VPA management.