﻿---
title: ES|QL CATEGORIZE function
description: 
url: https://www.elastic.co/elastic/docs-builder/docs/3016/reference/query-languages/esql/functions-operators/grouping-functions/categorize
products:
  - Elasticsearch
---

# ES|QL CATEGORIZE function
<note>
  The `CATEGORIZE` function requires a [platinum license](https://www.elastic.co/subscriptions).
</note>

<applies-to>
  - Elastic Stack: Generally available since 9.1
  - Elastic Stack: Preview in 9.0
</applies-to>


## Syntax

![Embedded](https://www.elastic.co/elastic/docs-builder/docs/3016/reference/query-languages/esql/images/functions/categorize.svg)


## Parameters

<definitions>
  <definition term="field">
    Expression to categorize
  </definition>
  <definition term="options">
    (Optional) Categorize additional options as [function named parameters](/elastic/docs-builder/docs/3016/reference/query-languages/esql/esql-syntax#esql-function-named-params). <applies-to>Elastic Stack: Generally available since 9.2</applies-to>}
  </definition>
</definitions>


## Description

Groups text messages into categories of similarly formatted text values.
`CATEGORIZE` has the following limitations:
- can’t be used within other expressions
- can’t be used more than once in the groupings
- can’t be used or referenced within aggregate functions and it has to be the first grouping


## Supported types


| field   | options | result  |
|---------|---------|---------|
| keyword |         | keyword |
| text    |         | keyword |


### Supported function named parameters

<definitions>
  <definition term="analyzer">
    (keyword) Analyzer used to convert the field into tokens for text categorization.
  </definition>
  <definition term="output_format">
    (keyword) The output format of the categories. Defaults to regex.
  </definition>
  <definition term="similarity_threshold">
    (integer) The minimum percentage of token weight that must match for text to be added to the category bucket. Must be between 1 and 100. The larger the value the narrower the categories. Larger values will increase memory usage and create narrower categories. Defaults to 70.
  </definition>
</definitions>


## Example

This example categorizes server logs messages into categories and aggregates their counts.
```esql
FROM sample_data
| STATS count=COUNT() BY category=CATEGORIZE(message)
```


| count:long | category:keyword         |
|------------|--------------------------|
| 3          | .*?Connected.+?to.*?     |
| 3          | .*?Connection.+?error.*? |
| 1          | .*?Disconnected.*?       |