Language analyzers
A set of analyzers aimed at analyzing specific language text. The following types are supported: arabic
, armenian
, basque
, bengali
, brazilian
, bulgarian
, catalan
, cjk
, czech
, danish
, dutch
, english
, estonian
, finnish
, french
, galician
, german
, greek
, hindi
, hungarian
, indonesian
, irish
, italian
, latvian
, lithuanian
, norwegian
, persian
, portuguese
, romanian
, russian
, serbian
, sorani
, spanish
, swedish
, turkish
, thai
.
All analyzers support setting custom stopwords
either internally in the config, or by using an external stopwords file by setting stopwords_path
. Check Stop Analyzer for more details.
The stem_exclusion
parameter allows you to specify an array of lowercase words that should not be stemmed. Internally, this functionality is implemented by adding the keyword_marker
token filter with the keywords
set to the value of the stem_exclusion
parameter.
The following analyzers support setting custom stem_exclusion
list: arabic
, armenian
, basque
, bengali
, bulgarian
, catalan
, czech
, dutch
, english
, finnish
, french
, galician
, german
, hindi
, hungarian
, indonesian
, irish
, italian
, latvian
, lithuanian
, norwegian
, portuguese
, romanian
, russian
, serbian
, sorani
, spanish
, swedish
, turkish
.
The built-in language analyzers can be reimplemented as custom
analyzers (as described below) in order to customize their behaviour.
If you do not intend to exclude words from being stemmed (the equivalent of the stem_exclusion
parameter above), then you should remove the keyword_marker
token filter from the custom analyzer configuration.
The arabic
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /arabic_example
{
"settings": {
"analysis": {
"filter": {
"arabic_stop": {
"type": "stop",
"stopwords": "_arabic_" 1
},
"arabic_keywords": {
"type": "keyword_marker",
"keywords": ["مثال"] 2
},
"arabic_stemmer": {
"type": "stemmer",
"language": "arabic"
}
},
"analyzer": {
"rebuilt_arabic": {
"tokenizer": "standard",
"filter": [
"lowercase",
"decimal_digit",
"arabic_stop",
"arabic_normalization",
"arabic_keywords",
"arabic_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The armenian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /armenian_example
{
"settings": {
"analysis": {
"filter": {
"armenian_stop": {
"type": "stop",
"stopwords": "_armenian_" 1
},
"armenian_keywords": {
"type": "keyword_marker",
"keywords": ["օրինակ"] 2
},
"armenian_stemmer": {
"type": "stemmer",
"language": "armenian"
}
},
"analyzer": {
"rebuilt_armenian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"armenian_stop",
"armenian_keywords",
"armenian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The basque
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /basque_example
{
"settings": {
"analysis": {
"filter": {
"basque_stop": {
"type": "stop",
"stopwords": "_basque_" 1
},
"basque_keywords": {
"type": "keyword_marker",
"keywords": ["Adibidez"] 2
},
"basque_stemmer": {
"type": "stemmer",
"language": "basque"
}
},
"analyzer": {
"rebuilt_basque": {
"tokenizer": "standard",
"filter": [
"lowercase",
"basque_stop",
"basque_keywords",
"basque_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The bengali
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /bengali_example
{
"settings": {
"analysis": {
"filter": {
"bengali_stop": {
"type": "stop",
"stopwords": "_bengali_" 1
},
"bengali_keywords": {
"type": "keyword_marker",
"keywords": ["উদাহরণ"] 2
},
"bengali_stemmer": {
"type": "stemmer",
"language": "bengali"
}
},
"analyzer": {
"rebuilt_bengali": {
"tokenizer": "standard",
"filter": [
"lowercase",
"decimal_digit",
"bengali_keywords",
"indic_normalization",
"bengali_normalization",
"bengali_stop",
"bengali_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The brazilian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /brazilian_example
{
"settings": {
"analysis": {
"filter": {
"brazilian_stop": {
"type": "stop",
"stopwords": "_brazilian_" 1
},
"brazilian_keywords": {
"type": "keyword_marker",
"keywords": ["exemplo"] 2
},
"brazilian_stemmer": {
"type": "stemmer",
"language": "brazilian"
}
},
"analyzer": {
"rebuilt_brazilian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"brazilian_stop",
"brazilian_keywords",
"brazilian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The bulgarian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /bulgarian_example
{
"settings": {
"analysis": {
"filter": {
"bulgarian_stop": {
"type": "stop",
"stopwords": "_bulgarian_" 1
},
"bulgarian_keywords": {
"type": "keyword_marker",
"keywords": ["пример"] 2
},
"bulgarian_stemmer": {
"type": "stemmer",
"language": "bulgarian"
}
},
"analyzer": {
"rebuilt_bulgarian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"bulgarian_stop",
"bulgarian_keywords",
"bulgarian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The catalan
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /catalan_example
{
"settings": {
"analysis": {
"filter": {
"catalan_elision": {
"type": "elision",
"articles": [ "d", "l", "m", "n", "s", "t"],
"articles_case": true
},
"catalan_stop": {
"type": "stop",
"stopwords": "_catalan_" 1
},
"catalan_keywords": {
"type": "keyword_marker",
"keywords": ["example"] 2
},
"catalan_stemmer": {
"type": "stemmer",
"language": "catalan"
}
},
"analyzer": {
"rebuilt_catalan": {
"tokenizer": "standard",
"filter": [
"catalan_elision",
"lowercase",
"catalan_stop",
"catalan_keywords",
"catalan_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
You may find that icu_analyzer
in the ICU analysis plugin works better for CJK text than the cjk
analyzer. Experiment with your text and queries.
The cjk
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /cjk_example
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": [ 1
"a", "and", "are", "as", "at", "be", "but", "by", "for",
"if", "in", "into", "is", "it", "no", "not", "of", "on",
"or", "s", "such", "t", "that", "the", "their", "then",
"there", "these", "they", "this", "to", "was", "will",
"with", "www"
]
}
},
"analyzer": {
"rebuilt_cjk": {
"tokenizer": "standard",
"filter": [
"cjk_width",
"lowercase",
"cjk_bigram",
"english_stop"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. The default stop words are almost the same as the_english_
set, but not exactly the same.
The czech
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /czech_example
{
"settings": {
"analysis": {
"filter": {
"czech_stop": {
"type": "stop",
"stopwords": "_czech_" 1
},
"czech_keywords": {
"type": "keyword_marker",
"keywords": ["příklad"] 2
},
"czech_stemmer": {
"type": "stemmer",
"language": "czech"
}
},
"analyzer": {
"rebuilt_czech": {
"tokenizer": "standard",
"filter": [
"lowercase",
"czech_stop",
"czech_keywords",
"czech_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The danish
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /danish_example
{
"settings": {
"analysis": {
"filter": {
"danish_stop": {
"type": "stop",
"stopwords": "_danish_" 1
},
"danish_keywords": {
"type": "keyword_marker",
"keywords": ["eksempel"] 2
},
"danish_stemmer": {
"type": "stemmer",
"language": "danish"
}
},
"analyzer": {
"rebuilt_danish": {
"tokenizer": "standard",
"filter": [
"lowercase",
"danish_stop",
"danish_keywords",
"danish_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The dutch
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /dutch_example
{
"settings": {
"analysis": {
"filter": {
"dutch_stop": {
"type": "stop",
"stopwords": "_dutch_" 1
},
"dutch_keywords": {
"type": "keyword_marker",
"keywords": ["voorbeeld"] 2
},
"dutch_stemmer": {
"type": "stemmer",
"language": "dutch"
},
"dutch_override": {
"type": "stemmer_override",
"rules": [
"fiets=>fiets",
"bromfiets=>bromfiets",
"ei=>eier",
"kind=>kinder"
]
}
},
"analyzer": {
"rebuilt_dutch": {
"tokenizer": "standard",
"filter": [
"lowercase",
"dutch_stop",
"dutch_keywords",
"dutch_override",
"dutch_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The english
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /english_example
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_" 1
},
"english_keywords": {
"type": "keyword_marker",
"keywords": ["example"] 2
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"rebuilt_english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The estonian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /estonian_example
{
"settings": {
"analysis": {
"filter": {
"estonian_stop": {
"type": "stop",
"stopwords": "_estonian_" 1
},
"estonian_keywords": {
"type": "keyword_marker",
"keywords": ["näide"] 2
},
"estonian_stemmer": {
"type": "stemmer",
"language": "estonian"
}
},
"analyzer": {
"rebuilt_estonian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"estonian_stop",
"estonian_keywords",
"estonian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The finnish
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /finnish_example
{
"settings": {
"analysis": {
"filter": {
"finnish_stop": {
"type": "stop",
"stopwords": "_finnish_" 1
},
"finnish_keywords": {
"type": "keyword_marker",
"keywords": ["esimerkki"] 2
},
"finnish_stemmer": {
"type": "stemmer",
"language": "finnish"
}
},
"analyzer": {
"rebuilt_finnish": {
"tokenizer": "standard",
"filter": [
"lowercase",
"finnish_stop",
"finnish_keywords",
"finnish_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The french
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /french_example
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision",
"articles_case": true,
"articles": [
"l", "m", "t", "qu", "n", "s",
"j", "d", "c", "jusqu", "quoiqu",
"lorsqu", "puisqu"
]
},
"french_stop": {
"type": "stop",
"stopwords": "_french_" 1
},
"french_keywords": {
"type": "keyword_marker",
"keywords": ["Example"] 2
},
"french_stemmer": {
"type": "stemmer",
"language": "light_french"
}
},
"analyzer": {
"rebuilt_french": {
"tokenizer": "standard",
"filter": [
"french_elision",
"lowercase",
"french_stop",
"french_keywords",
"french_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The galician
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /galician_example
{
"settings": {
"analysis": {
"filter": {
"galician_stop": {
"type": "stop",
"stopwords": "_galician_" 1
},
"galician_keywords": {
"type": "keyword_marker",
"keywords": ["exemplo"] 2
},
"galician_stemmer": {
"type": "stemmer",
"language": "galician"
}
},
"analyzer": {
"rebuilt_galician": {
"tokenizer": "standard",
"filter": [
"lowercase",
"galician_stop",
"galician_keywords",
"galician_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The german
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /german_example
{
"settings": {
"analysis": {
"filter": {
"german_stop": {
"type": "stop",
"stopwords": "_german_" 1
},
"german_keywords": {
"type": "keyword_marker",
"keywords": ["Beispiel"] 2
},
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
}
},
"analyzer": {
"rebuilt_german": {
"tokenizer": "standard",
"filter": [
"lowercase",
"german_stop",
"german_keywords",
"german_normalization",
"german_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The greek
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /greek_example
{
"settings": {
"analysis": {
"filter": {
"greek_stop": {
"type": "stop",
"stopwords": "_greek_" 1
},
"greek_lowercase": {
"type": "lowercase",
"language": "greek"
},
"greek_keywords": {
"type": "keyword_marker",
"keywords": ["παράδειγμα"] 2
},
"greek_stemmer": {
"type": "stemmer",
"language": "greek"
}
},
"analyzer": {
"rebuilt_greek": {
"tokenizer": "standard",
"filter": [
"greek_lowercase",
"greek_stop",
"greek_keywords",
"greek_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The hindi
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /hindi_example
{
"settings": {
"analysis": {
"filter": {
"hindi_stop": {
"type": "stop",
"stopwords": "_hindi_" 1
},
"hindi_keywords": {
"type": "keyword_marker",
"keywords": ["उदाहरण"] 2
},
"hindi_stemmer": {
"type": "stemmer",
"language": "hindi"
}
},
"analyzer": {
"rebuilt_hindi": {
"tokenizer": "standard",
"filter": [
"lowercase",
"decimal_digit",
"hindi_keywords",
"indic_normalization",
"hindi_normalization",
"hindi_stop",
"hindi_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The hungarian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /hungarian_example
{
"settings": {
"analysis": {
"filter": {
"hungarian_stop": {
"type": "stop",
"stopwords": "_hungarian_" 1
},
"hungarian_keywords": {
"type": "keyword_marker",
"keywords": ["példa"] 2
},
"hungarian_stemmer": {
"type": "stemmer",
"language": "hungarian"
}
},
"analyzer": {
"rebuilt_hungarian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"hungarian_stop",
"hungarian_keywords",
"hungarian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The indonesian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /indonesian_example
{
"settings": {
"analysis": {
"filter": {
"indonesian_stop": {
"type": "stop",
"stopwords": "_indonesian_" 1
},
"indonesian_keywords": {
"type": "keyword_marker",
"keywords": ["contoh"] 2
},
"indonesian_stemmer": {
"type": "stemmer",
"language": "indonesian"
}
},
"analyzer": {
"rebuilt_indonesian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"indonesian_stop",
"indonesian_keywords",
"indonesian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The irish
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /irish_example
{
"settings": {
"analysis": {
"filter": {
"irish_hyphenation": {
"type": "stop",
"stopwords": [ "h", "n", "t" ],
"ignore_case": true
},
"irish_elision": {
"type": "elision",
"articles": [ "d", "m", "b" ],
"articles_case": true
},
"irish_stop": {
"type": "stop",
"stopwords": "_irish_" 1
},
"irish_lowercase": {
"type": "lowercase",
"language": "irish"
},
"irish_keywords": {
"type": "keyword_marker",
"keywords": ["sampla"] 2
},
"irish_stemmer": {
"type": "stemmer",
"language": "irish"
}
},
"analyzer": {
"rebuilt_irish": {
"tokenizer": "standard",
"filter": [
"irish_hyphenation",
"irish_elision",
"irish_lowercase",
"irish_stop",
"irish_keywords",
"irish_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The italian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /italian_example
{
"settings": {
"analysis": {
"filter": {
"italian_elision": {
"type": "elision",
"articles": [
"c", "l", "all", "dall", "dell",
"nell", "sull", "coll", "pell",
"gl", "agl", "dagl", "degl", "negl",
"sugl", "un", "m", "t", "s", "v", "d"
],
"articles_case": true
},
"italian_stop": {
"type": "stop",
"stopwords": "_italian_" 1
},
"italian_keywords": {
"type": "keyword_marker",
"keywords": ["esempio"] 2
},
"italian_stemmer": {
"type": "stemmer",
"language": "light_italian"
}
},
"analyzer": {
"rebuilt_italian": {
"tokenizer": "standard",
"filter": [
"italian_elision",
"lowercase",
"italian_stop",
"italian_keywords",
"italian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The latvian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /latvian_example
{
"settings": {
"analysis": {
"filter": {
"latvian_stop": {
"type": "stop",
"stopwords": "_latvian_" 1
},
"latvian_keywords": {
"type": "keyword_marker",
"keywords": ["piemērs"] 2
},
"latvian_stemmer": {
"type": "stemmer",
"language": "latvian"
}
},
"analyzer": {
"rebuilt_latvian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"latvian_stop",
"latvian_keywords",
"latvian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The lithuanian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /lithuanian_example
{
"settings": {
"analysis": {
"filter": {
"lithuanian_stop": {
"type": "stop",
"stopwords": "_lithuanian_" 1
},
"lithuanian_keywords": {
"type": "keyword_marker",
"keywords": ["pavyzdys"] 2
},
"lithuanian_stemmer": {
"type": "stemmer",
"language": "lithuanian"
}
},
"analyzer": {
"rebuilt_lithuanian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"lithuanian_stop",
"lithuanian_keywords",
"lithuanian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The norwegian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /norwegian_example
{
"settings": {
"analysis": {
"filter": {
"norwegian_stop": {
"type": "stop",
"stopwords": "_norwegian_" 1
},
"norwegian_keywords": {
"type": "keyword_marker",
"keywords": ["eksempel"] 2
},
"norwegian_stemmer": {
"type": "stemmer",
"language": "norwegian"
}
},
"analyzer": {
"rebuilt_norwegian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"norwegian_stop",
"norwegian_keywords",
"norwegian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The persian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /persian_example
{
"settings": {
"analysis": {
"char_filter": {
"zero_width_spaces": {
"type": "mapping",
"mappings": [ "\\u200C=>\\u0020"] 1
}
},
"filter": {
"persian_stop": {
"type": "stop",
"stopwords": "_persian_" 2
}
},
"analyzer": {
"rebuilt_persian": {
"tokenizer": "standard",
"char_filter": [ "zero_width_spaces" ],
"filter": [
"lowercase",
"decimal_digit",
"arabic_normalization",
"persian_normalization",
"persian_stop",
"persian_stem"
]
}
}
}
}
}
- Replaces zero-width non-joiners with an ASCII space.
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters.
The portuguese
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /portuguese_example
{
"settings": {
"analysis": {
"filter": {
"portuguese_stop": {
"type": "stop",
"stopwords": "_portuguese_" 1
},
"portuguese_keywords": {
"type": "keyword_marker",
"keywords": ["exemplo"] 2
},
"portuguese_stemmer": {
"type": "stemmer",
"language": "light_portuguese"
}
},
"analyzer": {
"rebuilt_portuguese": {
"tokenizer": "standard",
"filter": [
"lowercase",
"portuguese_stop",
"portuguese_keywords",
"portuguese_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The romanian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /romanian_example
{
"settings": {
"analysis": {
"filter": {
"romanian_stop": {
"type": "stop",
"stopwords": "_romanian_" 1
},
"romanian_keywords": {
"type": "keyword_marker",
"keywords": ["exemplu"] 2
},
"romanian_stemmer": {
"type": "stemmer",
"language": "romanian"
}
},
"analyzer": {
"rebuilt_romanian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"romanian_stop",
"romanian_keywords",
"romanian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The russian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /russian_example
{
"settings": {
"analysis": {
"filter": {
"russian_stop": {
"type": "stop",
"stopwords": "_russian_" 1
},
"russian_keywords": {
"type": "keyword_marker",
"keywords": ["пример"] 2
},
"russian_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"rebuilt_russian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"russian_stop",
"russian_keywords",
"russian_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The serbian
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /serbian_example
{
"settings": {
"analysis": {
"filter": {
"serbian_stop": {
"type": "stop",
"stopwords": "_serbian_" 1
},
"serbian_keywords": {
"type": "keyword_marker",
"keywords": ["пример"] 2
},
"serbian_stemmer": {
"type": "stemmer",
"language": "serbian"
}
},
"analyzer": {
"rebuilt_serbian": {
"tokenizer": "standard",
"filter": [
"lowercase",
"serbian_stop",
"serbian_keywords",
"serbian_stemmer",
"serbian_normalization"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The sorani
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /sorani_example
{
"settings": {
"analysis": {
"filter": {
"sorani_stop": {
"type": "stop",
"stopwords": "_sorani_" 1
},
"sorani_keywords": {
"type": "keyword_marker",
"keywords": ["mînak"] 2
},
"sorani_stemmer": {
"type": "stemmer",
"language": "sorani"
}
},
"analyzer": {
"rebuilt_sorani": {
"tokenizer": "standard",
"filter": [
"sorani_normalization",
"lowercase",
"decimal_digit",
"sorani_stop",
"sorani_keywords",
"sorani_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The spanish
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /spanish_example
{
"settings": {
"analysis": {
"filter": {
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_" 1
},
"spanish_keywords": {
"type": "keyword_marker",
"keywords": ["ejemplo"] 2
},
"spanish_stemmer": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"rebuilt_spanish": {
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_keywords",
"spanish_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The swedish
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /swedish_example
{
"settings": {
"analysis": {
"filter": {
"swedish_stop": {
"type": "stop",
"stopwords": "_swedish_" 1
},
"swedish_keywords": {
"type": "keyword_marker",
"keywords": ["exempel"] 2
},
"swedish_stemmer": {
"type": "stemmer",
"language": "swedish"
}
},
"analyzer": {
"rebuilt_swedish": {
"tokenizer": "standard",
"filter": [
"lowercase",
"swedish_stop",
"swedish_keywords",
"swedish_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The turkish
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /turkish_example
{
"settings": {
"analysis": {
"filter": {
"turkish_stop": {
"type": "stop",
"stopwords": "_turkish_" 1
},
"turkish_lowercase": {
"type": "lowercase",
"language": "turkish"
},
"turkish_keywords": {
"type": "keyword_marker",
"keywords": ["örnek"] 2
},
"turkish_stemmer": {
"type": "stemmer",
"language": "turkish"
}
},
"analyzer": {
"rebuilt_turkish": {
"tokenizer": "standard",
"filter": [
"apostrophe",
"turkish_lowercase",
"turkish_stop",
"turkish_keywords",
"turkish_stemmer"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters. - This filter should be removed unless there are words which should be excluded from stemming.
The thai
analyzer could be reimplemented as a custom
analyzer as follows:
PUT /thai_example
{
"settings": {
"analysis": {
"filter": {
"thai_stop": {
"type": "stop",
"stopwords": "_thai_" 1
}
},
"analyzer": {
"rebuilt_thai": {
"tokenizer": "thai",
"filter": [
"lowercase",
"decimal_digit",
"thai_stop"
]
}
}
}
}
}
- The default stopwords can be overridden with the
stopwords
orstopwords_path
parameters.