Alerts

ℹ Paid feature

Alerts are available on paid plans (Pro and Max). Free accounts can view the configuration page but cannot activate alert rules.

CostPilot Alerts notify you when costs or resource metrics cross configured thresholds — before you get an unexpected cloud bill. Alerts can be scoped to an entire account, a specific cluster, a namespace, or a label dimension.

Rules are evaluated every minute. When a rule fires, CostPilot stores the alert, delivers it to all configured channels, and tracks the full delivery history.

Alert rule types

Cost spike

Triggers when cost increases by more than a specified percentage compared to the previous equivalent period.

Example: Alert when any namespace’s weekly cost increases by more than 30% compared to last week.

Budget threshold

Triggers when cumulative spend or projected monthly cost exceeds a fixed budget.

Example: Alert when the production cluster is projected to exceed its £4,000 monthly budget.

Metric threshold

Triggers when a resource metric (CPU, memory, idle cost) crosses a defined value.

Example: Alert when cluster idle cost exceeds £500/day, or when CPU efficiency drops below 40%.

Idle resource

Triggers when persistently idle or unscheduled resources are detected — workloads consuming reserved capacity with no meaningful utilisation.

Example: Alert when a namespace has had more than 25% idle cost for three consecutive evaluation periods.

Anomaly detection

Triggers when CostPilot detects a statistically significant deviation from recent baseline behaviour. No manual threshold required — the engine learns from your historical spend patterns.

Example: Alert when spend for a cluster deviates more than 2 standard deviations from its rolling 7-day baseline.

Delivery channels

Alerts can be delivered to any combination of the following channels. Channels are configured separately at Settings → Alerts → Channels and then referenced from alert rules.

Channel	Notes
Slack	Incoming webhook URL. Rich-formatted messages with severity colour coding.
Microsoft Teams	Incoming webhook URL. Adaptive card payload.
PagerDuty	Integration key (Events API v2). Severity maps to PagerDuty urgency.
Datadog	API key + app key. Posts as a Datadog event.
Grafana	Grafana API token + URL. Posts to a Grafana annotation or alert.
AlertManager	Endpoint URL. Posts in Alertmanager-compatible format.
Jira	API token + project key. Creates or updates a Jira issue.
Webhook	Any HTTPS endpoint. JSON POST with HMAC-SHA256 signature.

Channel filters

Each channel can be restricted to only receive alerts that match specific criteria:

Severity filter — Only deliver critical, warning, or info alerts (or any combination)
Category filter — Only deliver alerts from specific rule categories (e.g. cost, resource)
Cluster filter — Only deliver alerts from specific clusters

An empty filter means “deliver all”. Filters are evaluated with AND logic: an alert must pass every non-empty filter on a channel to be delivered.

Verification

After saving a channel, use the Test button to send a test message. CostPilot records whether the test succeeded and marks the channel as verified.

Configuring an alert rule

Navigate to Settings → Alerts → Rules and click New alert rule.

Step 1 — Choose scope

Select what the alert monitors:

Account — Total spend across all clusters
Cluster — A specific cluster
Namespace — A specific namespace within a cluster
Label — A specific label value (e.g. team=payments)

Step 2 — Set the threshold

Configure the trigger condition for your chosen rule type. For cost spike and budget threshold rules:

Threshold value — Amount in your account’s display currency, or percentage
Evaluation period — How often CostPilot evaluates the condition
Comparison window — For percentage change rules: compare to yesterday, last week, or last month

Step 3 — Set severity

Choose a severity level for alerts fired by this rule:

Critical — Requires immediate attention
Warning — Investigate soon
Info — Informational, no immediate action required

Step 4 — Save and activate

Give the rule a descriptive name and save. Rules activate immediately. Use Dry run to evaluate the rule against current data before enabling it.

Viewing and managing fired alerts

The /alerts route provides an operational view of all fired alerts across four tabs:

Overview

A summary dashboard showing alerts fired in the last 24 hours, delivery success rate, active rule count, and unhealthy channel count. A recent alerts timeline gives a quick view of the last 20 fired alerts with one-click acknowledge and resolve actions.

Active

All unresolved alerts that need attention. Each row shows:

Severity and rule name
The message that triggered the alert
Cluster scope (if applicable)
Time since firing
Delivery status icons per channel (✓ sent, ✕ failed, ↺ retrying)

Row actions:

Acknowledge — Mark as seen and under investigation
Resolve — Mark as resolved
Retry (on failed delivery rows) — Re-attempt delivery to a specific channel

Expand any row to see the full per-channel delivery detail including error messages.

History

Paginated, filterable list of all fired alerts. Filter by severity, alert status (active / acknowledged / resolved), and cluster. Expand rows to view full delivery attempt detail.

Channel Health

A health card for each configured delivery channel showing:

7-day delivery success rate (colour-coded: green ≥ 90%, amber ≥ 70%, red < 70%)
Total deliveries and failure count
Time since last successful delivery

Webhook payload

Webhook channel deliveries use an HMAC-SHA256 signed JSON POST. The signature is provided in the X-CostPilot-Signature header (hex-encoded SHA256 HMAC of the raw request body, keyed with your webhook secret).

{
  "alert_id": "alrt_01hxyz",
  "rule_name": "Production cluster daily budget",
  "rule_type": "budget_threshold",
  "severity": "critical",
  "triggered_at": "2026-02-28T14:23:00Z",
  "scope": "cluster",
  "scope_value": "prod-eu-1",
  "threshold": 4000.00,
  "actual_value": 4230.50,
  "currency": "EUR",
  "message": "Production cluster daily spend is projected to exceed the monthly budget."
}

ℹ Note

The currency field reflects your account’s configured display currency (set in Settings → General). Threshold values are stored and delivered in the same currency.

Rate limiting

To protect external services, CostPilot enforces a delivery rate limit of 60 notifications per minute per tenant. Alerts that exceed this limit are marked as failed and will not be automatically retried. You can manually retry failed deliveries from the Active tab.

Delivery guarantees

CostPilot records every delivery attempt per channel with a full audit trail (status, timestamp, error message). Failed deliveries are visible in the alert history and can be retried manually.

Each channel delivery attempt is independent — a failure on one channel does not affect delivery to others.

✦ Tip

Start with a daily budget alert on your largest cluster at 110% of your expected daily spend. This catches anomalies without generating noise from normal variance.