Loading

Trim token filter

Removes leading and trailing whitespace from each token in a stream. While this can change the length of a token, the trim filter does not change a token’s offsets.

The trim filter uses Lucene’s TrimFilter.

Tip

Many commonly used tokenizers, such as the standard or whitespace tokenizer, remove whitespace by default. When using these tokenizers, you don’t need to add a separate trim filter.

To see how the trim filter works, you first need to produce a token containing whitespace.

The following analyze API request uses the keyword tokenizer to produce a token for " fox ".

 GET _analyze {
  "tokenizer" : "keyword",
  "text" : " fox "
}

The API returns the following response. Note the " fox " token contains the original text’s whitespace. Note that despite changing the token’s length, the start_offset and end_offset remain the same.

{
  "tokens": [
    {
      "token": " fox ",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    }
  ]
}

To remove the whitespace, add the trim filter to the previous analyze API request.

 GET _analyze {
  "tokenizer" : "keyword",
  "filter" : ["trim"],
  "text" : " fox "
}

The API returns the following response. The returned fox token does not include any leading or trailing whitespace.

{
  "tokens": [
    {
      "token": "fox",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    }
  ]
}

The following create index API request uses the trim filter to configure a new custom analyzer.

 PUT trim_example {
  "settings": {
    "analysis": {
      "analyzer": {
        "keyword_trim": {
          "tokenizer": "keyword",
          "filter": [ "trim" ]
        }
      }
    }
  }
}