ES|QL CATEGORIZE function
The CATEGORIZE function requires a platinum license.
field- Expression to categorize
options-
(Optional) Categorize additional options as function named parameters.
}
Groups text messages into categories of similarly formatted text values.
CATEGORIZE has the following limitations:
- can’t be used within other expressions
- can’t be used more than once in the groupings
- can’t be used or referenced within aggregate functions and it has to be the first grouping
| field | options | result |
|---|---|---|
| keyword | keyword | |
| text | keyword |
analyzer- (keyword) Analyzer used to convert the field into tokens for text categorization.
output_format- (keyword) The output format of the categories. Defaults to regex.
similarity_threshold-
(integer) The minimum percentage of token weight that must match for text to be added to the category bucket. Must be between 1 and 100. The larger the value the narrower the categories. Larger values will increase memory usage and create narrower categories. Defaults to 70.
This example categorizes server logs messages into categories and aggregates their counts.
FROM sample_data
| STATS count=COUNT() BY category=CATEGORIZE(message)
| count:long | category:keyword |
|---|---|
| 3 | .*?Connected.+?to.*? |
| 3 | .*?Connection.+?error.*? |
| 1 | .*?Disconnected.*? |
Group log message categories by time interval by combining CATEGORIZE with BUCKET in the same BY clause.
FROM sample_data
| STATS count = COUNT(*) BY category = CATEGORIZE(message), time_bucket = BUCKET(@timestamp, 1 HOUR)
| SORT time_bucket DESC, count DESC, category
| count:long | category:keyword | time_bucket:datetime |
|---|---|---|
| 3 | .*?Connection.+?error.*? | 2023-10-23T13:00:00.000Z |
| 1 | .*?Connected.+?to.*? | 2023-10-23T13:00:00.000Z |
| 1 | .*?Disconnected.*? | 2023-10-23T13:00:00.000Z |
| 2 | .*?Connected.+?to.*? | 2023-10-23T12:00:00.000Z |
Surface one representative raw message per category by combining CATEGORIZE with SAMPLE.
FROM sample_data
| STATS sample_message = SAMPLE(message, 1) BY category = CATEGORIZE(message)
| SORT category
| sample_message:keyword | category:keyword |
|---|---|
| Connected to 10.1.0.1 | .*?Connected.+?to.*? |
| Connection error | .*?Connection.+?error.*? |
| Disconnected | .*?Disconnected.*? |