Loading

Kibana alerting v2 alerts

When a rule fires repeatedly on the same problem, a flat list of events doesn't tell you when the issue started, whether it's still happening, or how long it's been going on. Alert episodes fill that gap. Each episode is a persistent record of one issue on one series, from first breach through recovery, with every evaluation appended to the same history. Nothing is overwritten.

Every alert episode moves through these states:

inactive → pending → active → recovering → inactive
		
State What it means
Inactive Problem fully resolved. You get a recovery notification.
Pending Errors detected, but the system is waiting to confirm it's a real problem before fully alerting.
Active Problem confirmed and ongoing. This is when you get notified.
Recovering Errors have stopped, but the system is waiting to confirm it's truly resolved.

Activation and recovery thresholds control how many consecutive evaluations must agree, or how long the condition must persist, before transitioning. Refer to Configure a rule to learn more about these settings.

Suppose a service starts throwing errors at 10:00am and stops at 10:45am. Your rule runs in Alert mode every 5 seconds. Here's how one episode covers the entire incident, from detection to resolution:

  1. 10:00am - The rule detects errors. A new episode is created. With no activation threshold configured, it moves immediately from pending to active.
  2. 10:00am–10:45am - The rule continues detecting errors on every run. The same episode stays active. No new episodes are created.
  3. 10:45am - Errors stop. The episode moves to recovering. Without a recovery threshold, it transitions immediately to inactive.

One problem is tracked in one episode, even though the rule ran hundreds of times while the condition was ongoing.

A series is the ongoing relationship between a rule and one specific thing it monitors.

Your rule monitors services. Each service it tracks has its own series, one for checkout-service, one for payment-service, and so on. A series exists for as long as that rule keeps monitoring that service.

Think of it like a patient's medical file. The file exists as long as the patient is in the system. Individual health incidents come and go, but the file persists.

For the fields that identify a series in alert event documents, refer to Rule event and field reference.

An episode lives inside a series. A series can contain many episodes over its lifetime, one for each time that service had a problem.

Series: checkout-service
│
├── Episode 1: errors on April 10 (active → inactive)
├── Episode 2: errors on April 15 (active → inactive)
└── Episode 3: errors on April 18 (active right now)
		

The series is the container. Episodes are the individual problems that happened within it. When the series breaches again after recovering, a new episode starts.

This means you can track "the checkout service was broken from 02:14 to 03:21" and "the payment service was broken at the same time" as separate episodes, even when both come from the same rule.

Tip

Snooze operates at the series level, not the episode level. If you snooze checkout-service, you're silencing all notifications from that series for the next X hours, regardless of how many new episodes start during that time. You're quieting a specific ongoing situation, not a single alert.

Concept Analogy
Rule A security camera watching the building
Series The camera's feed for one specific door
Episode A specific incident caught on that feed
Rule events The individual video frames

The camera runs continuously (rule), always watching door 3 (series). One night someone breaks in. That's an episode. The frames captured during the break-in are the rule events.

Every time a rule finds a match, it writes a document to .rule-events. Whether that document is a signal or an alert depends on the rule's mode, and that choice determines whether the system only records what happened or actively tracks it through to resolution.

A signal is a one-time observation. The system writes it and moves on, no lifecycle, no notifications, no follow-up. An alert participates in an episode. The system links it to every other document from the same problem, tracks the lifecycle states, and routes notifications through action policies.

Type What it is When it's created
Signal A point-in-time record that the query matched (type: signal). Stored in .rule-events. Rules in Detect mode
Alert A lifecycle-tracked episode with type: alert and episode.* fields. Stored in .rule-events. Rules in Alert mode

A rule in Detect mode only writes signals. It never opens episodes, so action policies have nothing to match against.

Alert events are stored in .rule-events. Triage actions (acknowledge, snooze, resolve) are stored in .alert-actions. Both are queryable in Discover.

Both .rule-events and .alert-actions are data streams, append-only, time-series stores optimized for writes. On every rule evaluation, Kibana writes a new document to .rule-events rather than updating the previous one. Each document is a point-in-time snapshot. The episode.status field records the lifecycle state the episode was in at that exact evaluation. Nothing is overwritten.

Because every evaluation produces its own document, you can reconstruct the full history of an episode by querying all documents that share the same episode.id. Refer to Query alerts and signals in Discover for example queries.

Retention is managed automatically through ILM. Older backing indices move through storage tiers and are deleted when the retention window expires. You do not need to manually remove documents. Kibana manages versioning, retention, and lifecycle for both streams. Do not change their mappings or index settings.