Error count threshold rule
Alert when the number of errors in a service exceeds a defined threshold. Error count rules can be set at the environment level, service level, and error group level.
Filters and conditions ¶
Filter the errors coming from your application to apply an Error count threshold rule to a specific service (SERVICE
), environment (ENVIRONMENT
) or error grouping key (ERROR GROUPING KEY
). Alternatively, you can use a KQL filter to limit the scope of the alert by toggling on the Use KQL Filter option.
Tip
Similar errors are grouped together to make it easy to quickly see which errors are affecting your services and to take actions to rectify them. Each group of errors has a unique error grouping key — a hash of the stack trace and other properties.
Then, you can specify which conditions should result in an alert. This includes specifying:
- The number of errors that occurred (
IS ABOVE
). - The timeframe in which the errors must occur (
FOR THE LAST
) in seconds, minutes, hours, or days.
Example
This example creates a rule for all production services that would result in an alert when there are 25 errors in the last five minutes:
Alternatively, you can use a KQL filter to limit the scope of the alert:
- Toggle on Use KQL Filter.
- Add a filter:
service.environment:"Production"
Groups ¶
Set one or more group alerts by fields for custom threshold rules to perform a composite aggregation against the selected fields. When any of these groups match the selected rule conditions, an alert is triggered per group.
When you select multiple groupings, the group name is separated by commas.
When you select Alert me if a group stops reporting data, the rule is triggered if a group that previously reported metrics does not report them again over the expected time period.
Example: Group by one field
If you group alerts by the service.name
field and there are two services (Service A
and Service B
), when Service A
matches the conditions but Service B
doesn’t, one alert is triggered for Service A
. If both groups match the conditions, alerts are triggered for both groups.
Example: Group by multiple fields
If you group alerts by both the service.name
and service.environment
fields, and there are two services (Service A
and Service B
) and two environments (Production
and Staging
), the composite aggregation forms multiple groups.
If the Service A, Production
group matches the rule conditions, but the Service B, Staging
group doesn’t, one alert is triggered for Service A, Production
.
Rule schedule ¶
Define how often to evaluate the condition in seconds, minutes, hours, or days. Checks are queued so they run as close to the defined value as capacity allows.
Advanced options ¶
Optionally define an Alert delay. An alert will only occur when the specified number of consecutive runs meet the rule conditions.
Actions ¶
Extend your rules by connecting them to actions that use built-in integrations.
Action types ¶
Supported built-in integrations include:
- D3 Security
- IBM Resilient
- Index
- Jira
- Microsoft Teams
- Observability AI Assistant connector
- Opsgenie
- PagerDuty
- Server log
- ServiceNow ITOM
- ServiceNow ITSM
- ServiceNow SecOps
- Slack
- Swimlane
- Torq
- Webhook
- xMatters
Note
Some connector types are paid commercial features, while others are free. For a comparison of the Elastic subscription levels, go to the subscription page.
Action frequency ¶
After you select a connector, you must set the action frequency. You can choose to create a summary of alerts on each check interval or on a custom interval. Alternatively, you can set the action frequency such that you choose how often the action runs (for example, at each check interval, only when the alert status changes, or at a custom action interval).
You can also further refine the conditions under which actions run by specifying that actions only run they match a KQL query or when an alert occurs within a specific time frame:
- If alert matches query: Enter a KQL query that defines field-value pairs or query conditions that must be met for notifications to send. The query only searches alert documents in the indices specified for the rule.
- If alert is generated during timeframe: Set timeframe details. Notifications are only sent if alerts are generated within the timeframe you define.
Action variables ¶
A default message is provided as a starting point for your alert. If you want to customize the message, add more context to the message by clicking the icon above the message text box and selecting from a list of available variables.
Tip
To add variables to alert messages, use Mustache template syntax, for example {{variable.name}}
.
The following variables are specific to this rule type. You an also specify variables common to all rules.
context.alertDetailsUrl
- Link to the alert troubleshooting view for further context and details. This will be an empty string if the server.publicBaseUrl is not configured.
context.environment
- The transaction type the alert is created for
context.errorGroupingKey
- The error grouping key the alert is created for
context.errorGroupingName
- The error grouping name the alert is created for
context.interval
- The length and unit of the time period where the alert conditions were met
context.reason
- A concise description of the reason for the alert
context.serviceName
- The service the alert is created for
context.threshold
- Any trigger value above this value will cause the alert to fir
context.transactionName
- The transaction name the alert is created for
context.triggerValue
- The value that breached the threshold and triggered the alert
context.viewInAppUrl
-
Link to the alert source