Alerts
Alerts are available on paid plans (Pro and Max). Free accounts can view the configuration page but cannot activate alert rules.
CostPilot Alerts notify you when costs or resource metrics cross configured thresholds — before you get an unexpected cloud bill. Alerts can be scoped to an entire account, a specific cluster, a namespace, or a label dimension.
Rules are evaluated every minute. When a rule fires, CostPilot stores the alert, delivers it to all configured channels, and tracks the full delivery history.
Alert rule types
Cost spike
Triggers when cost increases by more than a specified percentage compared to the previous equivalent period.
Example: Alert when any namespace’s weekly cost increases by more than 30% compared to last week.
Budget threshold
Triggers when cumulative spend or projected monthly cost exceeds a fixed budget.
Example: Alert when the production cluster is projected to exceed its £4,000 monthly budget.
Metric threshold
Triggers when a resource metric (CPU, memory, idle cost) crosses a defined value.
Example: Alert when cluster idle cost exceeds £500/day, or when CPU efficiency drops below 40%.
Idle resource
Triggers when persistently idle or unscheduled resources are detected — workloads consuming reserved capacity with no meaningful utilisation.
Example: Alert when a namespace has had more than 25% idle cost for three consecutive evaluation periods.
Anomaly detection
Triggers when CostPilot detects a statistically significant deviation from recent baseline behaviour. No manual threshold required — the engine learns from your historical spend patterns.
Example: Alert when spend for a cluster deviates more than 2 standard deviations from its rolling 7-day baseline.
Delivery channels
Alerts can be delivered to any combination of the following channels. Channels are configured separately at Settings → Alerts → Channels and then referenced from alert rules.
| Channel | Notes |
|---|---|
| Slack | Incoming webhook URL. Rich-formatted messages with severity colour coding. |
| Microsoft Teams | Incoming webhook URL. Adaptive card payload. |
| PagerDuty | Integration key (Events API v2). Severity maps to PagerDuty urgency. |
| Datadog | API key + app key. Posts as a Datadog event. |
| Grafana | Grafana API token + URL. Posts to a Grafana annotation or alert. |
| AlertManager | Endpoint URL. Posts in Alertmanager-compatible format. |
| Jira | API token + project key. Creates or updates a Jira issue. |
| Webhook | Any HTTPS endpoint. JSON POST with HMAC-SHA256 signature. |
Channel filters
Each channel can be restricted to only receive alerts that match specific criteria:
- Severity filter — Only deliver
critical,warning, orinfoalerts (or any combination) - Category filter — Only deliver alerts from specific rule categories (e.g.
cost,resource) - Cluster filter — Only deliver alerts from specific clusters
An empty filter means “deliver all”. Filters are evaluated with AND logic: an alert must pass every non-empty filter on a channel to be delivered.
Verification
After saving a channel, use the Test button to send a test message. CostPilot records whether the test succeeded and marks the channel as verified.
Configuring an alert rule
Navigate to Settings → Alerts → Rules and click New alert rule.
Step 1 — Choose scope
Select what the alert monitors:
- Account — Total spend across all clusters
- Cluster — A specific cluster
- Namespace — A specific namespace within a cluster
- Label — A specific label value (e.g.
team=payments)
Step 2 — Set the threshold
Configure the trigger condition for your chosen rule type. For cost spike and budget threshold rules:
- Threshold value — Amount in your account’s display currency, or percentage
- Evaluation period — How often CostPilot evaluates the condition
- Comparison window — For percentage change rules: compare to yesterday, last week, or last month
Step 3 — Set severity
Choose a severity level for alerts fired by this rule:
- Critical — Requires immediate attention
- Warning — Investigate soon
- Info — Informational, no immediate action required
Step 4 — Save and activate
Give the rule a descriptive name and save. Rules activate immediately. Use Dry run to evaluate the rule against current data before enabling it.
Viewing and managing fired alerts
The /alerts route provides an operational view of all fired alerts across four tabs:
Overview
A summary dashboard showing alerts fired in the last 24 hours, delivery success rate, active rule count, and unhealthy channel count. A recent alerts timeline gives a quick view of the last 20 fired alerts with one-click acknowledge and resolve actions.
Active
All unresolved alerts that need attention. Each row shows:
- Severity and rule name
- The message that triggered the alert
- Cluster scope (if applicable)
- Time since firing
- Delivery status icons per channel (✓ sent, ✕ failed, ↺ retrying)
Row actions:
- Acknowledge — Mark as seen and under investigation
- Resolve — Mark as resolved
- Retry (on failed delivery rows) — Re-attempt delivery to a specific channel
Expand any row to see the full per-channel delivery detail including error messages.
History
Paginated, filterable list of all fired alerts. Filter by severity, alert status (active / acknowledged / resolved), and cluster. Expand rows to view full delivery attempt detail.
Channel Health
A health card for each configured delivery channel showing:
- 7-day delivery success rate (colour-coded: green ≥ 90%, amber ≥ 70%, red < 70%)
- Total deliveries and failure count
- Time since last successful delivery
Webhook payload
Webhook channel deliveries use an HMAC-SHA256 signed JSON POST. The signature is provided in the X-CostPilot-Signature header (hex-encoded SHA256 HMAC of the raw request body, keyed with your webhook secret).
{
"alert_id": "alrt_01hxyz",
"rule_name": "Production cluster daily budget",
"rule_type": "budget_threshold",
"severity": "critical",
"triggered_at": "2026-02-28T14:23:00Z",
"scope": "cluster",
"scope_value": "prod-eu-1",
"threshold": 4000.00,
"actual_value": 4230.50,
"currency": "EUR",
"message": "Production cluster daily spend is projected to exceed the monthly budget."
}
The currency field reflects your account’s configured display currency (set in Settings → General). Threshold values are stored and delivered in the same currency.
Rate limiting
To protect external services, CostPilot enforces a delivery rate limit of 60 notifications per minute per tenant. Alerts that exceed this limit are marked as failed and will not be automatically retried. You can manually retry failed deliveries from the Active tab.
Delivery guarantees
CostPilot records every delivery attempt per channel with a full audit trail (status, timestamp, error message). Failed deliveries are visible in the alert history and can be retried manually.
Each channel delivery attempt is independent — a failure on one channel does not affect delivery to others.
Start with a daily budget alert on your largest cluster at 110% of your expected daily spend. This catches anomalies without generating noise from normal variance.