﻿---
title: Generating alerts for anomaly detection jobs
description: This guide explains how to create alerts that notify you automatically when an anomaly is detected in a anomaly detection job, or when issues occur that...
url: https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-configuring-alerts
products:
  - Elasticsearch
  - Machine Learning
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# Generating alerts for anomaly detection jobs
This guide explains how to create alerts that notify you automatically when an anomaly is detected in a [anomaly detection job](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs), or when issues occur that affect job performance.
Kibana's alerting features support two types of machine learning rules, which run scheduled checks on your anomaly detection jobs:
<definitions>
  <definition term="Anomaly detection alert">
    Checks job results for anomalies that match your defined conditions and raises an alert when found.
  </definition>
  <definition term="Anomaly detection jobs health">
    Monitors the operational status of a job and alerts you if issues occur (such as a stopped datafeed or memory limit errors).
  </definition>
</definitions>

<tip>
  If you have created rules for specific anomaly detection jobs and you want to monitor whether these jobs work as expected, anomaly detection jobs health rules are ideal for this purpose.
</tip>

If the conditions of a rule are met, an alert is created, and any associated actions (such as sending an email or Slack message) are triggered. For example, you can configure a rule that checks a job every 15 minutes for anomalies with a high score and sends a notification when one is found.
In **Stack Management > Rules**, you can create both types of machine learning rules. In the **Machine Learning** app, you can create only anomaly detection alert rules; create them from the anomaly detection job wizard after you start the job or from the anomaly detection job list.

## Prerequisites

Before you begin, make sure that:
- You have at least one running [anomaly detection job](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs).
- You have appropriate [user permissions](https://www.elastic.co/elastic/docs-builder/docs/3028/deploy-manage/users-roles) to create and manage alert rules.
- If you would like to send notifications about alerts (such as Slack messages, emails, or webhooks), make sure you have configured the necessary [connectors](https://www.elastic.co/docs/reference/kibana/connectors-kibana).


## Anomaly detection alert rules

Anomaly detection alert rules monitor if the anomaly detection job results contain anomalies that match the rule conditions.
To set up an anomaly detection alert rule:
1. Open **Rules**: find **Stack Management > Rules** in the main menu or use the [global search field](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/find-and-organize/find-apps-and-objects).
2. Select the **Anomaly detection** rule type.
3. Select the [anomaly detection job](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs) that the rule applies to.
4. Select a type of machine learning result. You can create rules based on bucket, record, or influencer results.
5. (Optional) Configure the `anomaly_score` that triggers the action.
   The `anomaly_score` indicates the significance of a given anomaly compared to
   previous anomalies. The default severity threshold is 75 which means every
   anomaly with an `anomaly_score` of 75 or higher triggers the associated action.
6. <applies-to>Elastic Stack: Generally available since 9.3</applies-to><applies-to>Elastic Cloud Serverless: Generally available</applies-to> (Optional) To narrow down the list of anomalies that the rule looks for, add an **Anomaly filter**. This feature uses KQL and is only available for the Record and Influencer result types.
   In the **Anomaly filter** field, enter a KQL query that specifies fields or conditions to alert on. You can set up the following conditions:
   - One or more partitioning or influencers fields in the anomaly results match the specified conditions
- The actual or typical scores in the anomalies match the specified conditions
   For example, say you've set up alerting for an anomaly detection job that has `partition_field = "response.keyword"` as the detector. If you were only interested in being alerted on `response.keyword = 404`, enter `partition_field_value: "404"` into the **Anomaly filter** field. When the rule runs, it will only alert on anomalies with `partition_field_value: "404"`.
   <note>
   When you edit the KQL query, suggested filter-by fields appear. To compare actual and typical values for any fields, use operators such as `>` (greater than), `<` (less than), or `=` (equal to).
   </note>
7. (Optional) Turn on **Include interim results** to include results that are created by the anomaly detection job *before* a bucket is finalized. These results might disappear after the bucket is fully processed. Include interim results to get notified earlier about potential anomalies, even if they might be false positives. Don't include interim results if you want to get notified only about anomalies of fully processed buckets.
8. (Optional) Configure **Advanced settings**:
   - Configure the _Lookback interval_ to define how far back to query previous anomalies during each condition check. Its value is derived from the bucket span of the job and the query delay of the datafeed by default. It is not recommended to set the lookback interval lower than the default value, as it might result in missed anomalies.
- Configure the _Number of latest buckets_ to specify how many buckets to check to obtain the highest anomaly score found during the _Lookback interval_. The alert is created based on the highest scoring anomaly from the most anomalous bucket.

<tip>
  You can preview how the rule would perform on existing data:
  - Define the _check interval_ to specify how often the rule conditions are evaluated. It’s recommended to set this close to the job’s bucket span.
  - Click **Test**.
  The preview shows how many alerts would have been triggered during the selected time range.
</tip>

![Advanced settings and testing the rule condition](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/ml-anomaly-alert-advanced.jpg)

1. Set how often to check the rule conditions by selecting a time value and unit under **Rule schedule**.
2. Specify the rule's scope, which determines the [Kibana feature privileges](/elastic/docs-builder/docs/3028/deploy-manage/users-roles/cluster-or-deployment-auth/kibana-privileges#kibana-feature-privileges) that a role must have to access the rule and its alerts. Depending on your role's access, you can select one of the following:

- <applies-to>Elastic Stack: Generally available since 9.3</applies-to> **All**: (Default) Roles must have the appropriate privileges for one of the following features:
  - Infrastructure metrics (**Observability > Infrastructure**)
- Logs (**Observability > Logs**)
- APM (**Observability > APM and User Experience**)
- Synthetics (**Observability > Synthetics and Uptime**)
- Stack rules (**Management > Stack Rules**)
- **Logs**: Roles must have the appropriate **Observability > Logs** feature privileges.
- **Metrics**: Roles must have the appropriate **Observability > Infrastructure** feature privileges.
- **Stack Management**: Roles must have the appropriate **Management > Stack Rules** feature privileges.

For example, if you select **All**, a role with feature access to logs can view or edit the rule and its alerts from the Observability or the Stack Rules **Rules** page.
1. (Optional) Configure **Advanced options**:

- Define the number of consecutive matches required before an alert is triggered under **Alert delay**.
- Enable or disable **Flapping Detection** to reduce noise from frequently changing alerts. You can customize the flapping detection settings if you need different thresholds for detecting flapping behavior.

![Rule schedule and advanced settings](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/ml-anomaly-rule-schedule-advanced.jpg)

Next, define the [actions](#ml-configuring-alert-actions) that occur when the rule conditions are met.

## Anomaly detection jobs health rules

Anomaly detection jobs health rules monitor job health and alerts if an operational issue occurred that may prevent the job from detecting anomalies.
To set up an anomaly detection jobs alert rule:
1. Open **Rules**: find **Stack Management > Rules** in the main menu or use the [global search field](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/find-and-organize/find-apps-and-objects).
2. Select the **Anomaly detection jobs** rule type.

![Selecting Anomaly detection jobs health rules type](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/ml-anomaly-create-anomaly-job-health.png)

1. Include jobs and groups:
   - Select the job or group that the rule applies to. If you add more jobs to the selected group later, they are automatically included the next time the rule conditions are checked. To apply the rule to all your jobs, you can use a special character (`*`). This ensures that any jobs created after the rule is saved are automatically included.
- (Optional) To exclude jobs that are not critically important, use the **Exclude** field.
2. Enable the health check types you want to apply. All checks are enabled by default. At least one check needs to be enabled to create the rule. The following health checks are available:
   - **Datafeed is not started:** Notifies if the corresponding datafeed of the job is not started but the job is
  in an opened state. The notification message recommends the necessary
  actions to solve the error.
- **Model memory limit reached**: Notifies if the model memory status of the job reaches the soft or hard model
  memory limit. Optimize your job by following [these guidelines](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/anomaly-detection-scale) or consider [amending the model memory limit](/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/anomaly-detection-scale#set-model-memory-limit).
- **Data delay has occurred:** Notifies when the job missed some data. You can define the threshold for the
  amount of missing documents you get alerted on by setting _Number of documents_. You can control the lookback interval for checking delayed data with _Time interval_. Refer to the [Handling delayed data](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-delayed-data-detection) page to see what to do about delayed data.
- **Errors in job messages:** Notifies when the job messages contain error messages. Review the
  notification; it contains the error messages, the corresponding job IDs and recommendations on how to fix the issue. This check looks for job errors that occur after the rule is created; it does not look at historic behavior.

![Selecting health checkers](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/ml-health-check-config.jpg)

1. Set how often to check the rule conditions by selecting a time value and unit under **Rule schedule**. It is recommended to select an interval that is close to the bucket span of the job.
2. (Optional) Configure **Advanced options**:
   - Define the number of consecutive matches required before an alert is triggered under **Alert delay**.
- Enable or disable **Flapping Detection** to reduce noise from frequently changing alerts. You can customize the flapping detection settings if you need different thresholds for detecting flapping behavior.

![Rule schedule and advanced settings](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/ml-anomaly-rule-schedule-advanced.jpg)

Next, define the [actions](#ml-configuring-alert-actions) that occur when the rule conditions are met.

## Actions

You can send notifications when the rule conditions are met and when they are no longer met. These rules support:
- **Alert summaries:** Combine multiple alerts into a single notification, sent at regular intervals.
- **Per-alert actions for anomaly detection:** Trigger an action when an anomaly score meets the defined condition.
- **Per-alert actions for job health:** Trigger an action when an issue is detected in a job’s health status (for example, a stopped datafeed or memory issue).
- **Recovery actions:** Notify when a previously triggered alert returns to a normal state.

To set up an action:
1. Select a connector.

<important>
  Each action uses a connector, which stores connection information for a Kibana
  service or supported third-party integration, depending on where you want to
  send the notifications. For example, you can use a Slack connector to send a
  message to a channel. Or you can use an index connector that writes a JSON
  object to a specific index. For details about creating connectors, refer to
  [Connectors](/elastic/docs-builder/docs/3028/deploy-manage/manage-connectors#creating-new-connector).
</important>

1. Set the action frequency. Choose whether you want to send:
   - **Summary of alerts**: Groups multiple alerts into a single notification at each check interval or on a custom schedule.
- **A notification for each alert**: Sends individual alerts as they are triggered, recovered, or change state.

<dropdown title="Example: Summary of alerts">
  You can choose to create a summary of alerts on:
  - **Each check interval**: Sends a summary every time the rule runs (for example, every 5 minutes).
  - **Custom interval**: Sends a summary less often, on a schedule you define (for example, every hour), which helps reduce notification noise. A custom action interval cannot be shorter than the rule's check interval.
  For example, send slack notifications that summarize the new, ongoing, and recovered alerts:
  ![Adding an alert summary action to the rule](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/ml-anomaly-alert-action-summary.png)
</dropdown>

<dropdown title="Example: For each alert">
  Choose how often the action runs:
  - at each check interval,
  - only when the alert status changes, or
  - at a custom action interval.
  For *anomaly detection alert rules*, you must also choose whether the action runs when the anomaly score
  matches the condition or when the alert recovers:
  ![Adding an action for each alert in the rule](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/ml-anomaly-alert-action-score-matched.png)
  For *anomaly detection jobs health rules*, choose whether the action runs when the issue is
  detected or when it is recovered:
  ![Adding an action for each alert in the rule](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/ml-health-check-action.png)
</dropdown>

1. Specify that actions run only when they match a KQL query or occur within a specific time frame.
2. Use variables to customize the notification message. Click the icon above the message field to view available variables, or refer to [action variables](#action-variables). For example:

![Customizing your message](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/ml-anomaly-alert-messages.png)

After you save the configurations, the rule appears in the
*Stack Management > Rules* list; you can check its status and see the
overview of its configuration information.
When an alert occurs for an anomaly detection alert rule, it is always the same
name as the job ID of the associated anomaly detection job that triggered it. You can
review how the alerts that are occured correlate with the anomaly detection
results in the **Anomaly explorer** by using the **Anomaly timeline** swimlane
and the **Alerts** panel.
If necessary, you can snooze rules to prevent them from generating actions. For
more details, refer to
[Snooze and disable rules](/elastic/docs-builder/docs/3028/explore-analyze/alerting/alerts/create-manage-rules#controlling-rules).

## Action variables

The following variables are specific to the machine learning rule types. An asterisk (`*`)
marks the variables that you can use in actions related to recovered alerts.
You can also specify [variables common to all rules](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/alerting/alerts/rule-action-variables).

### Anomaly detection alert action variables

Every anomaly detection alert has the following action variables:
<definitions>
  <definition term="context.anomalyExplorerUrl^*^">
    URL to open in the Anomaly Explorer.
  </definition>
  <definition term="context.isInterim">
    Indicates if top hits contain interim results.
  </definition>
  <definition term="context.jobIds^*^">
    List of job IDs that triggered the alert.
  </definition>
  <definition term="context.message^*^">
    A preconstructed message for the alert.
  </definition>
  <definition term="context.score">
    Anomaly score at the time of the notification action.
  </definition>
  <definition term="context.timestamp">
    The bucket timestamp of the anomaly.
  </definition>
  <definition term="context.timestampIso8601">
    The bucket timestamp of the anomaly in ISO8601 format.
  </definition>
  <definition term="context.topInfluencers">
    The list of top influencers. Limited to a maximum of 3 documents.
  </definition>
</definitions>

<dropdown title="Properties of `context.topInfluencers`">
  <definitions>
    <definition term="influencer_field_name">
      The field name of the influencer.
    </definition>
    <definition term="influencer_field_value">
      The entity that influenced, contributed to, or was to blame for the anomaly.
    </definition>
    <definition term="score">
      The influencer score. A normalized score between 0–100 which shows the influencer’s overall contribution to the anomalies.
    </definition>
  </definitions>
</dropdown>

<definitions>
  <definition term="context.topRecords">
    The list of top records. Limited to a maximum of 3 documents.
  </definition>
</definitions>

<dropdown title="Properties of `context.topRecords`">
  <definitions>
    <definition term="actual">
      The actual value for the bucket.
    </definition>
    <definition term="by_field_value">
      The value of the by field.
    </definition>
    <definition term="field_name">
      Certain functions require a field to operate on, for example, `sum()`. For those functions, this value is the name of the field to be analyzed.
    </definition>
    <definition term="function">
      The function in which the anomaly occurs, as specified in the detector configuration. For example, `max`.
    </definition>
    <definition term="over_field_name">
      The field used to split the data.
    </definition>
    <definition term="partition_field_value">
      The field used to segment the analysis.
    </definition>
    <definition term="score">
      A normalized score between 0–100, which is based on the probability of the anomalousness of this record.
    </definition>
    <definition term="typical">
      The typical value for the bucket, according to analytical modeling.
    </definition>
  </definitions>
</dropdown>


### Anomaly detection health action variables

Every health check has two main variables: `context.message` and
`context.results`. The properties of `context.results` may vary based on the
type of check. You can find the possible properties for all the checks below.

#### Datafeed is not started

<definitions>
  <definition term="context.message^*^">
    A preconstructed message for the alert.
  </definition>
  <definition term="context.results">
    Contains the following properties:
  </definition>
</definitions>

<dropdown title="Properties of `context.results`">
  <definitions>
    <definition term="datafeed_id^*^">
      The datafeed identifier.
    </definition>
    <definition term="datafeed_state^*^">
      The state of the datafeed. It can be `starting`, `started`, `stopping`, or `stopped`.
    </definition>
    <definition term="job_id^*^">
      The job identifier.
    </definition>
    <definition term="job_state^*^">
      The state of the job. It can be `opening`, `opened`, `closing`, `closed`, or `failed`.
    </definition>
  </definitions>
</dropdown>


#### Model memory limit reached

<definitions>
  <definition term="context.message^*^">
    A preconstructed message for the rule.
  </definition>
  <definition term="context.results">
    Contains the following properties:
  </definition>
</definitions>

<dropdown title="Properties of `context.results`">
  <definitions>
    <definition term="job_id^*^">
      The job identifier.
    </definition>
    <definition term="memory_status^*^">
      The status of the mathematical model. It can have one of the following values:
      - `soft_limit`: The model used more than 60% of the configured memory limit and older unused models will be pruned to free up space. In categorization jobs, no further category examples will be stored.
      - `hard_limit`: The model used more space than the configured memory limit. As a result, not all incoming data was processed.
        The `memory_status` is `ok` for recovered alerts.
    </definition>
    <definition term="model_bytes^*^">
      The number of bytes of memory used by the models.
    </definition>
    <definition term="model_bytes_exceeded^*^">
      The number of bytes over the high limit for memory usage at the last allocation failure.
    </definition>
    <definition term="model_bytes_memory_limit^*^">
      The upper limit for model memory usage.
    </definition>
    <definition term="log_time^*^">
      The timestamp of the model size statistics according to server time. Time formatting is based on the Kibana settings.
    </definition>
    <definition term="peak_model_bytes^*^">
      The peak number of bytes of memory ever used by the model.
    </definition>
  </definitions>
</dropdown>


#### Data delay has occurred

<definitions>
  <definition term="context.message^*^">
    A preconstructed message for the rule.
  </definition>
  <definition term="context.results">
    For recovered alerts, `context.results` is either empty (when there is no delayed data) or the same as for an active alert (when the number of missing documents is less than the *Number of documents* threshold set by the user).
    Contains the following properties:
  </definition>
</definitions>

<dropdown title="Properties of `context.results`">
  <definitions>
    <definition term="annotation^*^">
      The annotation corresponding to the data delay in the job.
    </definition>
    <definition term="end_timestamp^*^">
      Timestamp of the latest finalized buckets with missing documents. Time formatting is based on the Kibana settings.
    </definition>
    <definition term="job_id^*^">
      The job identifier.
    </definition>
    <definition term="missed_docs_count^*^">
      The number of missed documents.
    </definition>
  </definitions>
</dropdown>


#### Error in job messages

<definitions>
  <definition term="context.message^*^">
    A preconstructed message for the rule.
  </definition>
  <definition term="context.results">
    Contains the following properties:
  </definition>
</definitions>

<dropdown title="Properties of `context.results`">
  <definitions>
    <definition term="timestamp">
      Timestamp of the latest finalized buckets with missing documents.
    </definition>
    <definition term="job_id">
      The job identifier.
    </definition>
    <definition term="message">
      The error message.
    </definition>
    <definition term="node_name">
      The name of the node that runs the job.
    </definition>
  </definitions>
</dropdown>