Token count field type
A field of type token_count
is really an integer
field which accepts string values, analyzes them, then indexes the number of tokens in the string.
For instance:
PUT my-index-000001
{
"mappings": {
"properties": {
"name": { 1
"type": "text",
"fields": {
"length": { 2
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
PUT my-index-000001/_doc/1
{ "name": "John Smith" }
PUT my-index-000001/_doc/2
{ "name": "Rachel Alice Williams" }
GET my-index-000001/_search
{
"query": {
"term": {
"name.length": 3 3
}
}
}
- The
name
field is atext
field which uses the defaultstandard
analyzer. - The
name.length
field is atoken_count
multi-field which will index the number of tokens in thename
field. - This query matches only the document containing
Rachel Alice Williams
, as it contains three tokens.
The following parameters are accepted by token_count
fields:
analyzer
- The analyzer which should be used to analyze the string value. Required. For best performance, use an analyzer without token filters.
enable_position_increments
- Indicates if position increments should be counted. Set to
false
if you don’t want to count tokens removed by analyzer filters (likestop
). Defaults totrue
. doc_values
- Should the field be stored on disk in a column-stride fashion, so that it can later be used for sorting, aggregations, or scripting? Accepts
true
(default) orfalse
. index
- Should the field be searchable? Accepts
true
(default) andfalse
. null_value
- Accepts a numeric value of the same
type
as the field which is substituted for any explicitnull
values. Defaults tonull
, which means the field is treated as missing. store
- Whether the field value should be stored and retrievable separately from the
_source
field. Acceptstrue
orfalse
(default).
Important
Synthetic _source
is Generally Available only for TSDB indices (indices that have index.mode
set to time_series
). For other indices synthetic _source
is in technical preview. Features in technical preview may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
token_count
fields support synthetic _source
in their default configuration.