﻿---
title: Hunspell token filter
description: Provides dictionary stemming based on a provided Hunspell dictionary. The hunspell filter requires configuration of one or more language-specific Hunspell...
url: https://www.elastic.co/elastic/docs-builder/docs/3028/reference/text-analysis/analysis-hunspell-tokenfilter
products:
  - Elasticsearch
---

# Hunspell token filter
Provides [dictionary stemming](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3028/manage-data/data-store/text-analysis/stemming#dictionary-stemmers) based on a provided [Hunspell dictionary](https://en.wikipedia.org/wiki/Hunspell). The `hunspell` filter requires [configuration](#analysis-hunspell-tokenfilter-dictionary-config) of one or more language-specific Hunspell dictionaries.
This filter uses Lucene’s [HunspellStemFilter](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/hunspell/HunspellStemFilter.md).
<tip>
  If available, we recommend trying an algorithmic stemmer for your language before using the `hunspell` token filter. In practice, algorithmic stemmers typically outperform dictionary stemmers. See [Dictionary stemmers](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3028/manage-data/data-store/text-analysis/stemming#dictionary-stemmers).
</tip>


## Configure Hunspell dictionaries

Hunspell dictionaries are stored and detected on a dedicated `hunspell` directory on the filesystem: `<$ES_PATH_CONF>/hunspell`. Each dictionary is expected to have its own directory, named after its associated language and locale (e.g., `pt_BR`, `en_GB`). This dictionary directory is expected to hold a single `.aff` and one or more `.dic` files, all of which will automatically be picked up. For example, the following directory layout will define the `en_US` dictionary:
```txt
- config
    |-- hunspell
    |    |-- en_US
    |    |    |-- en_US.dic
    |    |    |-- en_US.aff
```

Each dictionary can be configured with one setting:

<definitions>
  <definition term="ignore_case">
    (Static, Boolean) If true, dictionary matching will be case insensitive. Defaults to `false`.
    This setting can be configured globally in `elasticsearch.yml` using `indices.analysis.hunspell.dictionary.ignore_case`.
    To configure the setting for a specific locale, use the `indices.analysis.hunspell.dictionary.<locale>.ignore_case` setting (e.g., for the `en_US` (American English) locale, the setting is `indices.analysis.hunspell.dictionary.en_US.ignore_case`).
    You can also add a `settings.yml` file under the dictionary directory which holds these settings. This overrides any other `ignore_case` settings defined in `elasticsearch.yml`.
  </definition>
</definitions>


## Example

The following analyze API request uses the `hunspell` filter to stem `the foxes jumping quickly` to `the fox jump quick`.
The request specifies the `en_US` locale, meaning that the `.aff` and `.dic` files in the `<$ES_PATH_CONF>/hunspell/en_US` directory are used for the Hunspell dictionary.
```json

{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "hunspell",
      "locale": "en_US"
    }
  ],
  "text": "the foxes jumping quickly"
}
```

The filter produces the following tokens:
```text
[ the, fox, jump, quick ]
```


## Configurable parameters


<definitions>
  <definition term="dictionary">
    (Optional, string or array of strings) One or more `.dic` files (e.g, `en_US.dic, my_custom.dic`) to use for the Hunspell dictionary.
    By default, the `hunspell` filter uses all `.dic` files in the `<$ES_PATH_CONF>/hunspell/<locale>` directory specified using the `lang`, `language`, or `locale` parameter.
  </definition>
  <definition term="dedup">
    (Optional, Boolean) If `true`, duplicate tokens are removed from the filter’s output. Defaults to `true`.
  </definition>
  <definition term="lang">
    (Required*, string) An alias for the [`locale` parameter](#analysis-hunspell-tokenfilter-locale-param).
    If this parameter is not specified, the `language` or `locale` parameter is required.
  </definition>
  <definition term="language">
    (Required*, string) An alias for the [`locale` parameter](#analysis-hunspell-tokenfilter-locale-param).
    If this parameter is not specified, the `lang` or `locale` parameter is required.
  </definition>
</definitions>


<definitions>
  <definition term="locale">
    (Required*, string) Locale directory used to specify the `.aff` and `.dic` files for a Hunspell dictionary. See [Configure Hunspell dictionaries](#analysis-hunspell-tokenfilter-dictionary-config).
    If this parameter is not specified, the `lang` or `language` parameter is required.
  </definition>
  <definition term="longest_only">
    (Optional, Boolean) If `true`, only the longest stemmed version of each token is included in the output. If `false`, all stemmed versions of the token are included. Defaults to `false`.
  </definition>
</definitions>


## Customize and add to an analyzer

To customize the `hunspell` filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.
For example, the following [create index API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-create) request uses a custom `hunspell` filter, `my_en_US_dict_stemmer`, to configure a new [custom analyzer](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3028/manage-data/data-store/text-analysis/create-custom-analyzer).
The `my_en_US_dict_stemmer` filter uses a `locale` of `en_US`, meaning that the `.aff` and `.dic` files in the `<$ES_PATH_CONF>/hunspell/en_US` directory are used. The filter also includes a `dedup` argument of `false`, meaning that duplicate tokens added from the dictionary are not removed from the filter’s output.
```json

{
  "settings": {
    "analysis": {
      "analyzer": {
        "en": {
          "tokenizer": "standard",
          "filter": [ "my_en_US_dict_stemmer" ]
        }
      },
      "filter": {
        "my_en_US_dict_stemmer": {
          "type": "hunspell",
          "locale": "en_US",
          "dedup": false
        }
      }
    }
  }
}
```


## Settings

In addition to the [`ignore_case` settings](#analysis-hunspell-ignore-case-settings), you can configure the following global settings for the `hunspell` filter using `elasticsearch.yml`:
<definitions>
  <definition term="indices.analysis.hunspell.dictionary.lazy">
    (Static, Boolean) If `true`, the loading of Hunspell dictionaries is deferred until a dictionary is used. If `false`, the dictionary directory is checked for dictionaries when the node starts, and any dictionaries are automatically loaded. Defaults to `false`.
  </definition>
</definitions>