﻿---
title: Simple pattern tokenizer
description: The simple_pattern tokenizer uses a regular expression to capture matching text as terms. The set of regular expression features it supports is more limited...
url: https://www.elastic.co/elastic/docs-builder/docs/3028/reference/text-analysis/analysis-simplepattern-tokenizer
products:
  - Elasticsearch
---

# Simple pattern tokenizer
The `simple_pattern` tokenizer uses a regular expression to capture matching text as terms. The set of regular expression features it supports is more limited than the [`pattern`](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/text-analysis/analysis-pattern-tokenizer) tokenizer, but the tokenization is generally faster.
This tokenizer does not support splitting the input on a pattern match, unlike the [`pattern`](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/text-analysis/analysis-pattern-tokenizer) tokenizer. To split on pattern matches using the same restricted regular expression subset, see the [`simple_pattern_split`](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/text-analysis/analysis-simplepatternsplit-tokenizer) tokenizer.
This tokenizer uses [Lucene regular expressions](https://lucene.apache.org/core/10_0_0/core/org/apache/lucene/util/automaton/RegExp.html). For an explanation of the supported features and syntax, see [Regular Expression Syntax](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/query-languages/query-dsl/regexp-syntax).
The default pattern is the empty string, which produces no terms. This tokenizer should always be configured with a non-default pattern.

## Configuration

The `simple_pattern` tokenizer accepts the following parameters:
<definitions>
  <definition term="pattern">
    [Lucene regular expression](https://lucene.apache.org/core/10_0_0/core/org/apache/lucene/util/automaton/RegExp.html), defaults to the empty string.
  </definition>
</definitions>


## Example configuration

This example configures the `simple_pattern` tokenizer to produce terms that are three-digit numbers
```json

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "simple_pattern",
          "pattern": "[0123456789]{3}"
        }
      }
    }
  }
}


{
  "analyzer": "my_analyzer",
  "text": "fd-786-335-514-x"
}
```

The above example produces these terms:
```text
[ 786, 335, 514 ]
```