Stop token filter
Removes stop words from a token stream.
When not customized, the filter removes the following English stop words by default:
a
, an
, and
, are
, as
, at
, be
, but
, by
, for
, if
, in
, into
, is
, it
, no
, not
, of
, on
, or
, such
, that
, the
, their
, then
, there
, these
, they
, this
, to
, was
, will
, with
In addition to English, the stop
filter supports predefined stop word lists for several languages. You can also specify your own stop words as an array or file.
The stop
filter uses Lucene’s StopFilter.
The following analyze API request uses the stop
filter to remove the stop words a
and the
from a quick fox jumps over the lazy dog
:
GET /_analyze
{
"tokenizer": "standard",
"filter": [ "stop" ],
"text": "a quick fox jumps over the lazy dog"
}
The filter produces the following tokens:
[ quick, fox, jumps, over, lazy, dog ]
The following create index API request uses the stop
filter to configure a new custom analyzer.
PUT /my-index-000001
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [ "stop" ]
}
}
}
}
}
stopwords
- (Optional, string or array of strings) Language value, such as
_arabic_
or_thai_
. Defaults to_english_
.
Each language value corresponds to a predefined list of stop words in Lucene. See Stop words by language for supported language values and their stop words.
Also accepts an array of stop words.
For an empty list of stop words, use _none_
.
stopwords_path
- (Optional, string) Path to a file that contains a list of stop words to remove.
This path must be absolute or relative to the config
location, and the file must be UTF-8 encoded. Each stop word in the file must be separated by a line break.
ignore_case
- (Optional, Boolean) If
true
, stop word matching is case insensitive. For example, iftrue
, a stop word ofthe
matches and removesThe
,THE
, orthe
. Defaults tofalse
. remove_trailing
- (Optional, Boolean) If
true
, the last token of a stream is removed if it’s a stop word. Defaults totrue
.
This parameter should be false
when using the filter with a completion suggester. This would ensure a query like green a
matches and suggests green apple
while still removing other stop words. For more information about completion suggesters, refer to Suggester examples
To customize the stop
filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.
For example, the following request creates a custom case-insensitive stop
filter that removes stop words from the _english_
stop words list:
PUT /my-index-000001
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "whitespace",
"filter": [ "my_custom_stop_words_filter" ]
}
},
"filter": {
"my_custom_stop_words_filter": {
"type": "stop",
"ignore_case": true
}
}
}
}
}
You can also specify your own list of stop words. For example, the following request creates a custom case-insensitive stop
filter that removes only the stop words and
, is
, and the
:
PUT /my-index-000001
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "whitespace",
"filter": [ "my_custom_stop_words_filter" ]
}
},
"filter": {
"my_custom_stop_words_filter": {
"type": "stop",
"ignore_case": true,
"stopwords": [ "and", "is", "the" ]
}
}
}
}
}
The following list contains supported language values for the stopwords
parameter and a link to their predefined stop words in Lucene.
_arabic_
- Arabic stop words
_armenian_
- Armenian stop words
_basque_
- Basque stop words
_bengali_
- Bengali stop words
_brazilian_
(Brazilian Portuguese)- Brazilian Portuguese stop words
_bulgarian_
- Bulgarian stop words
_catalan_
- Catalan stop words
_cjk_
(Chinese, Japanese, and Korean)- CJK stop words
_czech_
- Czech stop words
_danish_
- Danish stop words
_dutch_
- Dutch stop words
_english_
- English stop words
_estonian_
- Estonian stop words
_finnish_
- Finnish stop words
_french_
- French stop words
_galician_
- Galician stop words
_german_
- German stop words
_greek_
- Greek stop words
_hindi_
- Hindi stop words
_hungarian_
- Hungarian stop words
_indonesian_
- Indonesian stop words
_irish_
- Irish stop words
_italian_
- Italian stop words
_latvian_
- Latvian stop words
_lithuanian_
- Lithuanian stop words
_norwegian_
- Norwegian stop words
_persian_
- Persian stop words
_portuguese_
- Portuguese stop words
_romanian_
- Romanian stop words
_russian_
- Russian stop words
_serbian_
- Serbian stop words
_sorani_
- Sorani stop words
_spanish_
- Spanish stop words
_swedish_
- Swedish stop words
_thai_
- Thai stop words
_turkish_