Upgrade the Agent

This guide covers how to upgrade the CostPilot agent to a new version, using either Helm or Kustomise. The Operator handles rolling restarts of agent pods automatically — you do not need to manage agent pods directly.


Before you upgrade

  1. Check the release notes for breaking changes.
  2. Verify the current version:
kubectl get pods -n costpilot -l app=cost-pilot-operator \
  -o jsonpath='{.items[0].spec.containers[0].image}'

Upgrading with Helm

1. Update the chart repository

helm repo update

2. Check the latest chart version

helm search repo costpilot/agent --versions

3. Upgrade

helm upgrade costpilot costpilot/agent \
  --namespace costpilot \
  --reuse-values

--reuse-values preserves your existing values (API key secret name, region, etc.) without you needing to re-specify them.

To pin to a specific chart version:

helm upgrade costpilot costpilot/agent \
  --namespace costpilot \
  --version 0.2.0 \
  --reuse-values

4. Verify

kubectl rollout status deployment/cost-pilot-operator -n costpilot
kubectl get pods -n costpilot

The Operator will reconcile within a few seconds and update the agent ReplicaSet to use the new image.


Upgrading with Kustomise

Option A — Update your image tag (overlay)

In your overlay’s kustomization.yaml:

images:
  - name: ghcr.io/smrt-devops/cost-pilot/agent
    newTag: "v0.2.0"    # update to the new version

Then apply:

kubectl apply -k kustomize/overlays/production/

Option B — Update the base reference

If you reference the remote base by Git tag:

# my-cluster/kustomization.yaml
resources:
  - github.com/smrt-devops/cost-pilot-agent//kustomize/base?ref=v0.2.0

Re-apply:

kubectl apply -k my-cluster/

3. Verify

kubectl rollout status deployment/cost-pilot-operator -n costpilot
kubectl get pods -n costpilot

What happens during an upgrade

  1. Helm or kubectl apply updates the Operator Deployment image.
  2. Kubernetes performs a rolling restart of the Operator pod.
  3. The new Operator pod starts and reads its own image SHA from its pod status.
  4. The Operator compares its image SHA against the SHA currently set on the agent pods.
  5. If they differ, the Operator updates the agent ReplicaSet’s pod template with the new image — triggering a rolling restart of agent pods.
  6. Agent pods are replaced one at a time. The remaining running replicas continue collecting metrics during the rollout.
Note

Metric collection continues uninterrupted during an upgrade. With three agent replicas and leader election, at least one replica is always collecting while others restart.


Rollback

Helm rollback

helm rollback costpilot -n costpilot

To roll back to a specific revision:

helm history costpilot -n costpilot   # list revisions
helm rollback costpilot 2 -n costpilot

Kustomise rollback

Update the image tag back to the previous version and re-apply.


Verifying metric collection after upgrade

After the upgrade completes, confirm metrics are flowing:

# Check agent logs for shipping activity
kubectl logs -n costpilot -l app=costpilot-agent --prefix --tail=20

# Check the Operator reconciled successfully
kubectl logs -n costpilot -l app=cost-pilot-operator --tail=20

You should see metrics shipped successfully in the agent logs within 15–30 seconds of the agents coming up. The CostPilot Dashboard shows a warning banner if no metrics are received for 15 minutes — if this appears, check the logs.