﻿---
title: Inference processor
description: Uses a pre-trained data frame analytics model or a model deployed for natural language processing tasks to infer against the data that is being ingested...
url: https://www.elastic.co/elastic/docs-builder/docs/3016/reference/enrich-processor/inference-processor
products:
  - Elasticsearch
---

# Inference processor
Uses a pre-trained data frame analytics model or a model deployed for natural language processing tasks to infer against the data that is being ingested in the pipeline.


| Name               | Required | Default                                   | Description                                                                                                                                                                                                                                                                |
|--------------------|----------|-------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `model_id`         | yes      | -                                         | (String) An inference ID, a model deployment ID, a trained model ID or an alias.                                                                                                                                                                                           |
| `input_output`     | no       | -                                         | (List) Input fields for inference and output (destination) fields for the inference results. This option is incompatible with the `target_field` and `field_map` options.                                                                                                  |
| `target_field`     | no       | `ml.inference.<processor_tag>`            | (String) Field added to incoming documents to contain results objects.                                                                                                                                                                                                     |
| `field_map`        | no       | If defined the model’s default field map  | (Object) Maps the document field names to the known field names of the model. This mapping takes precedence over any default mappings provided in the model configuration.                                                                                                 |
| `inference_config` | no       | The default settings defined in the model | (Object) Contains the inference type and its options.                                                                                                                                                                                                                      |
| `ignore_missing`   | no       | `false`                                   | (Boolean) If `true` and any of the input fields defined in `input_ouput` are missing then those missing fields are quietly ignored, otherwise a missing field causes a failure. Only applies when using `input_output` configurations to explicitly list the input fields. |
| `description`      | no       | -                                         | Description of the processor. Useful for describing the purpose of the processor or its configuration.                                                                                                                                                                     |
| `if`               | no       | -                                         | Conditionally execute the processor. See [Conditionally run a processor](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/manage-data/ingest/transform-enrich/ingest-pipelines#conditionally-run-processor).                                             |
| `ignore_failure`   | no       | `false`                                   | Ignore failures for the processor. See [Handling pipeline failures](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/manage-data/ingest/transform-enrich/ingest-pipelines#handling-pipeline-failures).                                                   |
| `on_failure`       | no       | -                                         | Handle failures for the processor. See [Handling pipeline failures](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/manage-data/ingest/transform-enrich/ingest-pipelines#handling-pipeline-failures).                                                   |
| `tag`              | no       | -                                         | Identifier for the processor. Useful for debugging and metrics.                                                                                                                                                                                                            |

<important>
  - You cannot use the `input_output` field with the `target_field` and `field_map` fields. For NLP models, use the `input_output` option. For data frame analytics models, use the `target_field` and `field_map` option.
  - Each inference input field must be single strings, not arrays of strings.
  - The `input_field` is processed as is and ignores any [index mapping](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/manage-data/data-store/mapping)'s [analyzers](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/manage-data/data-store/text-analysis) at time of inference run.
</important>


## Configuring input and output fields

Select the `content` field for inference and write the result to `content_embedding`.
<important>
  If the specified `output_field` already exists in the ingest document, it won’t be overwritten. The inference results will be appended to the existing fields within `output_field`, which could lead to duplicate fields and potential errors. To avoid this, use an unique `output_field` field name that does not clash with any existing fields.
</important>

```js
{
  "inference": {
    "model_id": "model_deployment_for_inference",
    "input_output": [
        {
            "input_field": "content",
            "output_field": "content_embedding"
        }
    ]
  }
}
```


## Configuring multiple inputs

The `content` and `title` fields will be read from the incoming document and sent to the model for the inference. The inference output is written to `content_embedding` and `title_embedding` respectively.
```js
{
  "inference": {
    "model_id": "model_deployment_for_inference",
    "input_output": [
        {
            "input_field": "content",
            "output_field": "content_embedding"
        },
        {
            "input_field": "title",
            "output_field": "title_embedding"
        }
    ]
  }
}
```

Selecting the input fields with `input_output` is incompatible with the `target_field` and `field_map` options.
Data frame analytics models must use the `target_field` to specify the root location results are written to and optionally a `field_map` to map field names in the input document to the model input fields.
```js
{
  "inference": {
    "model_id": "model_deployment_for_inference",
    "target_field": "FlightDelayMin_prediction_infer",
    "field_map": {
      "your_field": "my_field"
    },
    "inference_config": { "regression": {} }
  }
}
```


## Classification configuration options

Classification configuration for inference.
<definitions>
  <definition term="num_top_classes">
    (Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.
  </definition>
  <definition term="num_top_feature_importance_values">
    (Optional, integer) Specifies the maximum number of [feature importance](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/explore-analyze/machine-learning/data-frame-analytics/ml-feature-importance) values per document. Defaults to 0 which means no feature importance calculation occurs.
  </definition>
  <definition term="results_field">
    (Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the `results_field` value of the data frame analytics job that was used to train the model, which defaults to `<dependent_variable>_prediction`.
  </definition>
  <definition term="top_classes_results_field">
    (Optional, string) Specifies the field to which the top classes are written. Defaults to `top_classes`.
  </definition>
  <definition term="prediction_field_type">
    (Optional, string) Specifies the type of the predicted field to write. Valid values are: `string`, `number`, `boolean`. When `boolean` is provided `1.0` is transformed to `true` and `0.0` to `false`.
  </definition>
</definitions>


## Fill mask configuration options

<definitions>
  <definition term="num_top_classes">
    (Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.
  </definition>
  <definition term="results_field">
    (Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the `results_field` value of the data frame analytics job that was used to train the model, which defaults to `<dependent_variable>_prediction`.
  </definition>
  <definition term="tokenization">
    (Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is `bert`. Valid tokenization values are
    - `bert`: Use for BERT-style models
    - `deberta_v2`: Use for DeBERTa v2 and v3-style models
    - `mpnet`: Use for MPNet-style models
    - `roberta`: Use for RoBERTa-style and BART-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `xlm_roberta`: Use for XLMRoBERTa-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `bert_ja`: Use for BERT-style models trained for the Japanese language.
  </definition>
</definitions>

<dropdown title="Properties of tokenization">
  <definitions>
    <definition term="bert">
      (Optional, object) BERT-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of bert">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
    <definition term="deberta_v2">
      (Optional, object) DeBERTa-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of deberta_v2">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `balanced`: One or both of the first and second sequences may be truncated so as to balance the tokens included from both sequences.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
      </dropdown>
    </definition>
    <definition term="roberta">
      (Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of roberta">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
    <definition term="mpnet">
      (Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of mpnet">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
  </definitions>
</dropdown>


## NER configuration options

<definitions>
  <definition term="results_field">
    (Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the `results_field` value of the data frame analytics job that was used to train the model, which defaults to `<dependent_variable>_prediction`.
  </definition>
  <definition term="tokenization">
    (Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is `bert`. Valid tokenization values are
    - `bert`: Use for BERT-style models
    - `deberta_v2`: Use for DeBERTa v2 and v3-style models
    - `mpnet`: Use for MPNet-style models
    - `roberta`: Use for RoBERTa-style and BART-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `xlm_roberta`: Use for XLMRoBERTa-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `bert_ja`: Use for BERT-style models trained for the Japanese language.
  </definition>
</definitions>

<dropdown title="Properties of tokenization">
  <definitions>
    <definition term="bert">
      (Optional, object) BERT-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of bert">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
    <definition term="deberta_v2">
      (Optional, object) DeBERTa-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of deberta_v2">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `balanced`: One or both of the first and second sequences may be truncated so as to balance the tokens included from both sequences.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
      </dropdown>
    </definition>
    <definition term="roberta">
      (Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of roberta">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
    <definition term="mpnet">
      (Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of mpnet">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
  </definitions>
</dropdown>


## Regression configuration options

Regression configuration for inference.
<definitions>
  <definition term="results_field">
    (Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the `results_field` value of the data frame analytics job that was used to train the model, which defaults to `<dependent_variable>_prediction`.
  </definition>
  <definition term="num_top_feature_importance_values">
    (Optional, integer) Specifies the maximum number of [feature importance](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/explore-analyze/machine-learning/data-frame-analytics/ml-feature-importance) values per document. By default, it is zero and no feature importance calculation occurs.
  </definition>
</definitions>


## Text classification configuration options

<definitions>
  <definition term="classification_labels">
    (Optional, string) An array of classification labels.
  </definition>
  <definition term="num_top_classes">
    (Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.
  </definition>
  <definition term="results_field">
    (Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the `results_field` value of the data frame analytics job that was used to train the model, which defaults to `<dependent_variable>_prediction`.
  </definition>
  <definition term="tokenization">
    (Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is `bert`. Valid tokenization values are
    - `bert`: Use for BERT-style models
    - `deberta_v2`: Use for DeBERTa v2 and v3-style models
    - `mpnet`: Use for MPNet-style models
    - `roberta`: Use for RoBERTa-style and BART-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `xlm_roberta`: Use for XLMRoBERTa-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `bert_ja`: Use for BERT-style models trained for the Japanese language.
  </definition>
</definitions>

<dropdown title="Properties of tokenization">
  <definitions>
    <definition term="bert">
      (Optional, object) BERT-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of bert">
        `span`
        (Optional, integer) When `truncate` is `none`, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.
        The default value is `-1`, indicating no windowing or spanning occurs.
        <note>
          When your typical input is just slightly larger than `max_sequence_length`, it may be best to simply truncate; there will be very little information in the second subsequence.
        </note>
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
    <definition term="deberta_v2">
      (Optional, object) DeBERTa-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of deberta_v2">
        `span`
        (Optional, integer) When `truncate` is `none`, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.
        The default value is `-1`, indicating no windowing or spanning occurs.
        <note>
          When your typical input is just slightly larger than `max_sequence_length`, it may be best to simply truncate; there will be very little information in the second subsequence.
        </note>
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `balanced`: One or both of the first and second sequences may be truncated so as to balance the tokens included from both sequences.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
      </dropdown>
    </definition>
    <definition term="roberta">
      (Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of roberta">
        `span`
        (Optional, integer) When `truncate` is `none`, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.
        The default value is `-1`, indicating no windowing or spanning occurs.
        <note>
          When your typical input is just slightly larger than `max_sequence_length`, it may be best to simply truncate; there will be very little information in the second subsequence.
        </note>
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
    <definition term="mpnet">
      (Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of mpnet">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
  </definitions>
</dropdown>


## Text embedding configuration options

<definitions>
  <definition term="results_field">
    (Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the `results_field` value of the data frame analytics job that was used to train the model, which defaults to `<dependent_variable>_prediction`.
  </definition>
  <definition term="tokenization">
    (Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is `bert`. Valid tokenization values are
    - `bert`: Use for BERT-style models
    - `deberta_v2`: Use for DeBERTa v2 and v3-style models
    - `mpnet`: Use for MPNet-style models
    - `roberta`: Use for RoBERTa-style and BART-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `xlm_roberta`: Use for XLMRoBERTa-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `bert_ja`: Use for BERT-style models trained for the Japanese language.
  </definition>
</definitions>

<dropdown title="Properties of tokenization">
  <definitions>
    <definition term="bert">
      (Optional, object) BERT-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of bert">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
    <definition term="deberta_v2">
      (Optional, object) DeBERTa-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of deberta_v2">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `balanced`: One or both of the first and second sequences may be truncated so as to balance the tokens included from both sequences.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
      </dropdown>
    </definition>
    <definition term="roberta">
      (Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of roberta">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
    <definition term="mpnet">
      (Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of mpnet">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
  </definitions>
</dropdown>


## Text expansion configuration options

<definitions>
  <definition term="results_field">
    (Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the `results_field` value of the data frame analytics job that was used to train the model, which defaults to `<dependent_variable>_prediction`.
  </definition>
  <definition term="tokenization">
    (Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is `bert`. Valid tokenization values are
    - `bert`: Use for BERT-style models
    - `deberta_v2`: Use for DeBERTa v2 and v3-style models
    - `mpnet`: Use for MPNet-style models
    - `roberta`: Use for RoBERTa-style and BART-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `xlm_roberta`: Use for XLMRoBERTa-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `bert_ja`: Use for BERT-style models trained for the Japanese language.
  </definition>
</definitions>

<dropdown title="Properties of tokenization">
  <definitions>
    <definition term="bert">
      (Optional, object) BERT-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of bert">
        `span`
        (Optional, integer) When `truncate` is `none`, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.
        The default value is `-1`, indicating no windowing or spanning occurs.
        <note>
          When your typical input is just slightly larger than `max_sequence_length`, it may be best to simply truncate; there will be very little information in the second subsequence.
        </note>
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
    <definition term="deberta_v2">
      (Optional, object) DeBERTa-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of deberta_v2">
        `span`
        (Optional, integer) When `truncate` is `none`, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.
        The default value is `-1`, indicating no windowing or spanning occurs.
        <note>
          When your typical input is just slightly larger than `max_sequence_length`, it may be best to simply truncate; there will be very little information in the second subsequence.
        </note>
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `balanced`: One or both of the first and second sequences may be truncated so as to balance the tokens included from both sequences.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
      </dropdown>
    </definition>
    <definition term="roberta">
      (Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of roberta">
        `span`
        (Optional, integer) When `truncate` is `none`, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.
        The default value is `-1`, indicating no windowing or spanning occurs.
        <note>
          When your typical input is just slightly larger than `max_sequence_length`, it may be best to simply truncate; there will be very little information in the second subsequence.
        </note>
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
    <definition term="mpnet">
      (Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.
      <dropdown title="Properties of mpnet">
        `truncate`
        (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
        - `none`: No truncation occurs; the inference request receives an error.
        - `first`: Only the first sequence is truncated.
        - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
        <note>
          For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
        </note>
      </dropdown>
    </definition>
  </definitions>
</dropdown>


## Text similarity configuration options

<definitions>
  <definition term="text_similarity">
    (Object, optional) Text similarity takes an input sequence and compares it with another input sequence. This is commonly referred to as cross-encoding. This task is useful for ranking document text when comparing it to another provided text input.
  </definition>
</definitions>

<dropdown title="Properties of text_similarity inference">
  <definitions>
    <definition term="span_score_combination_function">
      (Optional, string) Identifies how to combine the resulting similarity score when a provided text passage is longer than `max_sequence_length` and must be automatically separated for multiple calls. This only is applicable when `truncate` is `none` and `span` is a non-negative number. The default value is `max`. Available options are:
      - `max`: The maximum score from all the spans is returned.
      - `mean`: The mean score over all the spans is returned.
    </definition>
    <definition term="tokenization">
      (Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is `bert`. Valid tokenization values are
      - `bert`: Use for BERT-style models
      - `deberta_v2`: Use for DeBERTa v2 and v3-style models
      - `mpnet`: Use for MPNet-style models
      - `roberta`: Use for RoBERTa-style and BART-style models
      - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `xlm_roberta`: Use for XLMRoBERTa-style models
      - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `bert_ja`: Use for BERT-style models trained for the Japanese language.
        Refer to [Properties of `tokenizaton`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-put-trained-model) to review the properties of the `tokenization` object.
    </definition>
  </definitions>
</dropdown>


## Zero shot classification configuration options

<definitions>
  <definition term="labels">
    (Optional, array) The labels to classify. Can be set at creation for default labels, and then updated during inference.
  </definition>
  <definition term="multi_label">
    (Optional, boolean) Indicates if more than one `true` label is possible given the input. This is useful when labeling text that could pertain to more than one of the input labels. Defaults to `false`.
  </definition>
  <definition term="results_field">
    (Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the `results_field` value of the data frame analytics job that was used to train the model, which defaults to `<dependent_variable>_prediction`.
  </definition>
  <definition term="tokenization">
    (Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is `bert`. Valid tokenization values are
    - `bert`: Use for BERT-style models
    - `deberta_v2`: Use for DeBERTa v2 and v3-style models
    - `mpnet`: Use for MPNet-style models
    - `roberta`: Use for RoBERTa-style and BART-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `xlm_roberta`: Use for XLMRoBERTa-style models
    - <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to> `bert_ja`: Use for BERT-style models trained for the Japanese language.
    <dropdown title="Properties of tokenization">
      <definitions>
        <definition term="bert">
          (Optional, object) BERT-style tokenization is to be performed with the enclosed settings.
          <dropdown title="Properties of bert">
            `truncate`
            (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
            - `none`: No truncation occurs; the inference request receives an error.
            - `first`: Only the first sequence is truncated.
            - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
            <note>
              For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
            </note>
          </dropdown>
        </definition>
        <definition term="deberta_v2">
          (Optional, object) DeBERTa-style tokenization is to be performed with the enclosed settings.
          <dropdown title="Properties of deberta_v2">
            `truncate`
            (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
            - `balanced`: One or both of the first and second sequences may be truncated so as to balance the tokens included from both sequences.
            - `none`: No truncation occurs; the inference request receives an error.
            - `first`: Only the first sequence is truncated.
            - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
          </dropdown>
        </definition>
        <definition term="roberta">
          (Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.
          <dropdown title="Properties of roberta">
            `truncate`
            (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
            - `none`: No truncation occurs; the inference request receives an error.
            - `first`: Only the first sequence is truncated.
            - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
            <note>
              For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
            </note>
          </dropdown>
        </definition>
        <definition term="mpnet">
          (Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.
          <dropdown title="Properties of mpnet">
            `truncate`
            (Optional, string) Indicates how tokens are truncated when they exceed `max_sequence_length`. The default value is `first`.
            - `none`: No truncation occurs; the inference request receives an error.
            - `first`: Only the first sequence is truncated.
            - `second`: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.
            <note>
              For `zero_shot_classification`, the hypothesis sequence is always the second sequence. Therefore, do not use `second` in this case.
            </note>
          </dropdown>
        </definition>
      </definitions>
    </dropdown>
  </definition>
</definitions>


## Inference processor examples

```js
"inference":{
  "model_id": "my_model_id",
  "field_map": {
    "original_fieldname": "expected_fieldname"
  },
  "inference_config": {
    "regression": {
      "results_field": "my_regression"
    }
  }
}
```

This configuration specifies a `regression` inference and the results are written to the `my_regression` field contained in the `target_field` results object. The `field_map` configuration maps the field `original_fieldname` from the source document to the field expected by the model.
```js
"inference":{
  "model_id":"my_model_id"
  "inference_config": {
    "classification": {
      "num_top_classes": 2,
      "results_field": "prediction",
      "top_classes_results_field": "probabilities"
    }
  }
}
```

This configuration specifies a `classification` inference. The number of categories for which the predicted probabilities are reported is 2 (`num_top_classes`). The result is written to the `prediction` field and the top classes to the `probabilities` field. Both fields are contained in the `target_field` results object.
For an example that uses natural language processing trained models, refer to [Add NLP inference to ingest pipelines](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/explore-analyze/machine-learning/nlp/ml-nlp-inference).

### Feature importance object mapping

To get the full benefit of aggregating and searching for [feature importance](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/explore-analyze/machine-learning/data-frame-analytics/ml-feature-importance), update your index mapping of the feature importance result field as you can see below:
```js
"ml.inference.feature_importance": {
  "type": "nested",
  "dynamic": true,
  "properties": {
    "feature_name": {
      "type": "keyword"
    },
    "importance": {
      "type": "double"
    }
  }
}
```

The mapping field name for feature importance (in the example above, it is `ml.inference.feature_importance`) is compounded as follows:
`<ml.inference.target_field>`.`<inference.tag>`.`feature_importance`
- `<ml.inference.target_field>`: defaults to `ml.inference`.
- `<inference.tag>`: if is not provided in the processor definition, then it is not part of the field path.

For example, if you provide a tag `foo` in the definition as you can see below:
```js
{
  "tag": "foo",
  ...
}
```

Then, the feature importance value is written to the `ml.inference.foo.feature_importance` field.
You can also specify the target field as follows:
```js
{
  "tag": "foo",
  "target_field": "my_field"
}
```

In this case, feature importance is exposed in the `my_field.foo.feature_importance` field.

### Inference processor examples

The following example uses an [inference endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference) in an inference processor named `query_helper_pipeline` to perform a chat completion task. The processor generates an Elasticsearch query from natural language input using a prompt designed for a completion task type. Refer to [this list](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put) for the inference service you use and check the corresponding examples of setting up an endpoint with the chat completion task type.
```json

{
  "processors": [
    {
      "script": {
        "source": "ctx.prompt = 'Please generate an elasticsearch search query on index `articles_index` for the following natural language query. Dates are in the field `@timestamp`, document types are in the field `type` (options are `news`, `publication`), categories in the field `category` and can be multiple (options are `medicine`, `pharmaceuticals`, `technology`), and document names are in the field `title` which should use a fuzzy match. Ignore fields which cannot be determined from the natural language query context: ' + ctx.content" <1>
      }
    },
    {
      "inference": {
        "model_id": "openai_chat_completions", <2>
        "input_output": {
          "input_field": "prompt",
          "output_field": "query"
        }
      }
    },
    {
      "remove": {
        "field": "prompt"
      }
    }
  ]
}
```

The following API request will simulate running a document through the ingest pipeline created previously:
```json

{
  "docs": [
    {
      "_source": {
        "content": "artificial intelligence in medicine articles published in the last 12 months" <1>
      }
    }
  ]
}
```


### Further readings

- [Which job is the best for you? Using LLMs and semantic_text to match resumes to jobs](https://www.elastic.co/search-labs/blog/openwebcrawler-llms-semantic-text-resume-job-search)