Kibana alerting v2 alerts
When a rule fires repeatedly on the same problem, a flat list of events doesn't tell you when the issue started, whether it's still happening, or how long it's been going on. Alert episodes fill that gap. Each episode is a persistent record of one issue on one series, from first breach through recovery, with every evaluation appended to the same history. Nothing is overwritten.
Every alert episode moves through these states:
inactive → pending → active → recovering → inactive
| State | What it means |
|---|---|
| Inactive | Problem fully resolved. You get a recovery notification. |
| Pending | Errors detected, but the system is waiting to confirm it's a real problem before fully alerting. |
| Active | Problem confirmed and ongoing. This is when you get notified. |
| Recovering | Errors have stopped, but the system is waiting to confirm it's truly resolved. |
Activation and recovery thresholds control how many consecutive evaluations must agree, or how long the condition must persist, before transitioning. Refer to Configure a rule to learn more about these settings.
Suppose a service starts throwing errors at 10:00am and stops at 10:45am. Your rule runs in Alert mode every 5 seconds. Here's how one episode covers the entire incident, from detection to resolution:
- 10:00am - The rule detects errors. A new episode is created. With no activation threshold configured, it moves immediately from
pendingtoactive. - 10:00am–10:45am - The rule continues detecting errors on every run. The same episode stays
active. No new episodes are created. - 10:45am - Errors stop. The episode moves to
recovering. Without a recovery threshold, it transitions immediately toinactive.
One problem is tracked in one episode, even though the rule ran hundreds of times while the condition was ongoing.
A series is the ongoing relationship between a rule and one specific thing it monitors.
Your rule monitors services. Each service it tracks has its own series, one for checkout-service, one for payment-service, and so on. A series exists for as long as that rule keeps monitoring that service.
Think of it like a patient's medical file. The file exists as long as the patient is in the system. Individual health incidents come and go, but the file persists.
For the fields that identify a series in alert event documents, refer to Rule event and field reference.
An episode lives inside a series. A series can contain many episodes over its lifetime, one for each time that service had a problem.
Series: checkout-service
│
├── Episode 1: errors on April 10 (active → inactive)
├── Episode 2: errors on April 15 (active → inactive)
└── Episode 3: errors on April 18 (active right now)
The series is the container. Episodes are the individual problems that happened within it. When the series breaches again after recovering, a new episode starts.
This means you can track "the checkout service was broken from 02:14 to 03:21" and "the payment service was broken at the same time" as separate episodes, even when both come from the same rule.
Snooze operates at the series level, not the episode level. If you snooze checkout-service, you're silencing all notifications from that series for the next X hours, regardless of how many new episodes start during that time. You're quieting a specific ongoing situation, not a single alert.
| Concept | Analogy |
|---|---|
| Rule | A security camera watching the building |
| Series | The camera's feed for one specific door |
| Episode | A specific incident caught on that feed |
| Rule events | The individual video frames |
The camera runs continuously (rule), always watching door 3 (series). One night someone breaks in. That's an episode. The frames captured during the break-in are the rule events.
Every time a rule finds a match, it writes a document to .rule-events. Whether that document is a signal or an alert depends on the rule's mode, and that choice determines whether the system only records what happened or actively tracks it through to resolution.
A signal is a one-time observation. The system writes it and moves on, no lifecycle, no notifications, no follow-up. An alert participates in an episode. The system links it to every other document from the same problem, tracks the lifecycle states, and routes notifications through action policies.
| Type | What it is | When it's created |
|---|---|---|
| Signal | A point-in-time record that the query matched (type: signal). Stored in .rule-events. |
Rules in Detect mode |
| Alert | A lifecycle-tracked episode with type: alert and episode.* fields. Stored in .rule-events. |
Rules in Alert mode |
A rule in Detect mode only writes signals. It never opens episodes, so action policies have nothing to match against.
Alert events are stored in .rule-events. Triage actions (acknowledge, snooze, resolve) are stored in .alert-actions. Both are queryable in Discover.
Both .rule-events and .alert-actions are data streams, append-only, time-series stores optimized for writes. On every rule evaluation, Kibana writes a new document to .rule-events rather than updating the previous one. Each document is a point-in-time snapshot. The episode.status field records the lifecycle state the episode was in at that exact evaluation. Nothing is overwritten.
Because every evaluation produces its own document, you can reconstruct the full history of an episode by querying all documents that share the same episode.id. Refer to Query alerts and signals in Discover for example queries.
Retention is managed automatically through ILM. Older backing indices move through storage tiers and are deleted when the retention window expires. You do not need to manually remove documents. Kibana manages versioning, retention, and lifecycle for both streams. Do not change their mappings or index settings.
- View, manage, and reference alerts: Open the alert episodes table, triage active episodes, and acknowledge, snooze, or resolve them.
- Query alerts and signals in Discover: Use ES|QL to query
.rule-eventsand.alert-actionsfor ad hoc analysis and dashboards. - Alert states and fields reference: Look up lifecycle states, field names, and episode document structure.
- Notifications: Set up action policies to route alert episodes to the right people and channels.