﻿---
title: Fingerprint analyzer
description: The fingerprint analyzer implements a fingerprinting algorithm which is used by the OpenRefine project to assist in clustering. Input text is lowercased,...
url: https://www.elastic.co/elastic/docs-builder/docs/3016/reference/text-analysis/analysis-fingerprint-analyzer
products:
  - Elasticsearch
---

# Fingerprint analyzer
The `fingerprint` analyzer implements a [fingerprinting algorithm](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth#fingerprint) which is used by the OpenRefine project to assist in clustering.
Input text is lowercased, normalized to remove extended characters, sorted, deduplicated and concatenated into a single token. If a stopword list is configured, stop words will also be removed.

## Example output

```json

{
  "analyzer": "fingerprint",
  "text": "Yes yes, Gödel said this sentence is consistent and."
}
```

The above sentence would produce the following single term:
```text
[ and consistent godel is said sentence this yes ]
```


## Configuration

The `fingerprint` analyzer accepts the following parameters:
<definitions>
  <definition term="separator">
    The character to use to concatenate the terms. Defaults to a space.
  </definition>
  <definition term="max_output_size">
    The maximum token size to emit. Defaults to `255`. Tokens larger than this size will be discarded.
  </definition>
  <definition term="stopwords">
    A pre-defined stop words list like `_english_` or an array containing a list of stop words. Defaults to `_none_`.
  </definition>
  <definition term="stopwords_path">
    The path to a file containing stop words.
  </definition>
</definitions>

See the [Stop Token Filter](https://www.elastic.co/elastic/docs-builder/docs/3016/reference/text-analysis/analysis-stop-tokenfilter) for more information about stop word configuration.

## Example configuration

In this example, we configure the `fingerprint` analyzer to use the pre-defined list of English stop words:
```json

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_fingerprint_analyzer": {
          "type": "fingerprint",
          "stopwords": "_english_"
        }
      }
    }
  }
}


{
  "analyzer": "my_fingerprint_analyzer",
  "text": "Yes yes, Gödel said this sentence is consistent and."
}
```

The above example produces the following term:
```text
[ consistent godel said sentence yes ]
```


## Definition

The `fingerprint` tokenizer consists of:
<definitions>
  <definition term="Tokenizer">
    - [Standard Tokenizer](https://www.elastic.co/elastic/docs-builder/docs/3016/reference/text-analysis/analysis-standard-tokenizer)
  </definition>
  <definition term="Token Filters (in order)">
    - [Lower Case Token Filter](https://www.elastic.co/elastic/docs-builder/docs/3016/reference/text-analysis/analysis-lowercase-tokenfilter)
    - [ASCII folding](https://www.elastic.co/elastic/docs-builder/docs/3016/reference/text-analysis/analysis-asciifolding-tokenfilter)
    - [Stop Token Filter](https://www.elastic.co/elastic/docs-builder/docs/3016/reference/text-analysis/analysis-stop-tokenfilter) (disabled by default)
    - [Fingerprint](https://www.elastic.co/elastic/docs-builder/docs/3016/reference/text-analysis/analysis-fingerprint-tokenfilter)
  </definition>
</definitions>

If you need to customize the `fingerprint` analyzer beyond the configuration parameters then you need to recreate it as a `custom` analyzer and modify it, usually by adding token filters. This would recreate the built-in `fingerprint` analyzer and you can use it as a starting point for further customization:
```json

{
  "settings": {
    "analysis": {
      "analyzer": {
        "rebuilt_fingerprint": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding",
            "fingerprint"
          ]
        }
      }
    }
  }
}
```