Set up and configure `semantic_text` fields

This page provides instructions for setting up and configuring semantic_text fields. Learn how to configure inference endpoints, including default and preconfigured options, ELSER on EIS, custom endpoints, and dedicated endpoints for ingestion and search operations.

Configure inference endpoints

You can configure inference endpoints for semantic_text fields in the following ways:

Use ELSER on EIS
Default and preconfigured endpoints
Use a custom inference endpoint

Note

If you use a custom inference endpoint through your ML node and not through Elastic Inference Service (EIS), the recommended method is to use dedicated endpoints for ingestion and search.

If you use EIS, you don't have to set up dedicated endpoints.

Use default endpoints

A default endpoint is the inference endpoint that is used when you create a semantic_text field without specifying an inference_id.

The following example shows a semantic_text field configured to use the default inference endpoint:

						PUT my-index-000001
					{
  "mappings": {
    "properties": {
      "inference_field": {
        "type": "semantic_text"
      }
    }
  }
}
		
	

The default inference endpoint varies by deployment type and version:

On Elastic Cloud Hosted deployments running Elastic Stack 9.4+ and on Serverless, the inference_id parameter defaults to .jina-embeddings-v5-text-small and runs on EIS.

Important

Jina models are not supported for deployment on machine learning nodes. They can be used through EIS or external inference providers, and therefore require external network connectivity. Fully air-gapped environments are not currently supported.
On Elastic Cloud Hosted deployments running Elastic Stack 9.3, the inference_id parameter defaults to .elser-2-elastic and runs on EIS.
On Elastic Cloud Hosted deployments running Elastic Stack 9.0-9.2, the inference_id parameter defaults to .elser-2-elasticsearch and runs on the elasticsearch service.

If you use the default inference endpoint, it might be updated to a newer version and use a different embedding model than the previous default endpoints. Queries that target these indices together can produce unexpected ranking results. For details, refer to potential issues when mixing embedding models across indices.

If a semantic_text field relies on the default inference endpoint, the model used to generate embeddings might change across versions or deployments. This can result in indices using different embedding models.

For example, if the semantic_text field is created without specifying inference_id, indices created on Elastic Cloud 9.3 use the .elser-2-elastic endpoint by default, while indices created on Elastic Cloud 9.4+ use .jina-embeddings-v5-text-small. As a result, older indices contain ELSER embeddings while newer indices contain Jina embeddings.

Mixed embedding models across indices can occur in several common scenarios, including:

Data streams with ILM rollover: When a data stream rolls over, older backing indices may contain ELSER embeddings while newer indices created after an upgrade use Jina embeddings.
Aliases referencing multiple indices: An alias can point to several indices created at different times. If some indices use ELSER and others use Jina, searches against the alias will query both.
Explicit multi-index searches: Queries that target multiple indices (for example GET index1,index2/_search) might combine results from indices using different embedding models.
Cross-cluster search: Searches across multiple clusters may query indices created on different stack versions, which may use different default inference endpoints.

Queries that target indices using different embedding models can lead to issues or unexpected results. The following sections describe these issues and how to mitigate them.

Incorrect ranking due to different scoring scales

ELSER and Jina use different scoring ranges. ELSER scores typically range between 0 and above 10, while Jina scores are normalized between 0 and 1.

When results from both models are ranked together, ELSER documents might appear ahead of Jina documents even if the Jina results are more relevant. This can lead to misleading rankings without any errors being returned.

To mitigate this issue, ensure that indices queried together use the same embedding model by explicitly specifying the inference_id when defining semantic_text fields:

						PUT my-index-000001
					{
  "mappings": {
    "properties": {
      "inference_field": {
        "type": "semantic_text",
        "inference_id": ".jina-embeddings-v5-text-small"
      }
    }
  }
}
		
	

If indices using different embedding models must be queried together, normalize the scores using a linear retriever:

						GET my-index/_search
					{
  "retriever": {
    "linear": {
      "query": "how do neural networks learn",
      "fields": ["inference_field"],
      "normalizer": "minmax"
    }
  }
}
		
	

Increased inference cost during search

When a query targets indices that use different inference endpoints, Elasticsearch must generate query embeddings for each model. This increases inference workload and cost during search.

To mitigate this issue, ensure that indices queried together use the same inference endpoint. You can do this by:

explicitly setting the inference_id when defining the semantic_text field for new indices, or
by reindexing older indices with the desired endpoint.

Alerts based on raw relevance scores might stop triggering

Some alerts or rules rely on raw _score values. Because ELSER and Jina use different score ranges, score thresholds designed for ELSER might no longer work when results are generated with Jina.

For example, a condition such as _score > 10 might never be satisfied by Jina results, because Jina scores are normalized between 0 and 1.

To mitigate this issue, adjust alert thresholds to match the scoring range of the embedding model being used, or avoid relying on raw _score values in alert conditions.

Use preconfigured endpoints

Preconfigured endpoints are inference endpoints that are automatically available in the deployment or project and do not require manual creation. The available preconfigured endpoints vary across deployment types and versions.

To view the list of available preconfigured endpoints for your deployment, go to Inference endpoints in Kibana.

To use a preconfigured endpoint, set the inference_id parameter to the identifier of the endpoint you want to use:

						PUT my-index-000004
					{
  "mappings": {
    "properties": {
      "inference_field": {
        "type": "semantic_text",
        "inference_id": ".jina-embeddings-v5-text-nano"
      }
    }
  }
}
		
	

Use ELSER on EIS

If you use the preconfigured .elser-2-elastic endpoint that utilizes the ELSER model as a service through the Elastic Inference Service (ELSER on EIS), you can set up semantic_text with the following API request:

Using ELSER on EIS on Serverless

						PUT my-index-000001
					{
  "mappings": {
    "properties": {
      "inference_field": {
        "type": "semantic_text"
      }
    }
  }
}
		
	

Using ELSER on EIS in Cloud

						PUT my-index-000001
					{
  "mappings": {
    "properties": {
      "inference_field": {
        "type": "semantic_text",
        "inference_id": ".elser-2-elastic"
      }
    }
  }
}
		
	

Use a custom inference endpoint

To use a custom inference endpoint instead of the default or preconfigured endpoints, you must Create inference API and specify its inference_id when setting up the semantic_text field type.

						PUT my-index-000002
					{
  "mappings": {
    "properties": {
      "inference_field": {
        "type": "semantic_text",
        "inference_id": "my-openai-endpoint"
      }
    }
  }
}
		
	

The inference_id of the inference endpoint to use to generate embeddings.

Use dedicated endpoints for ingestion and search

If you use a custom inference endpoint through your ML node and not through Elastic Inference Service, the recommended way to use semantic_text is by having dedicated inference endpoints for ingestion and search.

This ensures that search speed remains unaffected by ingestion workloads, and vice versa. After creating dedicated inference endpoints for both, you can reference them using the inference_id and search_inference_id parameters when setting up the index mapping for an index that uses the semantic_text field.

						PUT my-index-000003
					{
  "mappings": {
    "properties": {
      "inference_field": {
        "type": "semantic_text",
        "inference_id": "my-elser-endpoint-for-ingest",
        "search_inference_id": "my-elser-endpoint-for-search"
      }
    }
  }
}
		
	

Set `index_options` for `sparse_vectors`

Configuring index_options for sparse vector fields lets you configure token pruning, which controls whether non-significant or overly frequent tokens are omitted to improve query performance.

The following example enables token pruning and sets pruning thresholds for a sparse_vector field:

						PUT semantic-embeddings
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "index_options": {
          "sparse_vector": {
            "prune": true,
            "pruning_config": {
              "tokens_freq_ratio_threshold": 10,
              "tokens_weight_threshold": 0.5
            }
          }
        }
      }
    }
  }
}
		
	

(Optional) Enables pruning. Default is true.
(Optional) Prunes tokens whose frequency is more than 10 times the average token frequency in the field. Default is 5.
(Optional) Prunes tokens whose weight is lower than 0.5. Default is 0.4.

Learn more about sparse_vector index options settings and token pruning.

Set `index_options` for `dense_vectors`

Configuring index_options for dense vector fields lets you control how dense vectors are indexed for kNN search. You can select the indexing algorithm, such as int8_hnsw, int4_hnsw, or disk_bbq, among other available index options.

The following example shows how to configure index_options for a dense vector field using the int8_hnsw indexing algorithm:

						PUT semantic-embeddings
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "index_options": {
          "dense_vector": {
            "type": "int8_hnsw",
            "m": 15,
            "ef_construction": 90
          }
        }
      }
    }
  }
}
		
	

(Optional) Selects the int8_hnsw vector quantization strategy. Learn about default quantization types.
(Optional) Sets m to 15 to control how many neighbors each node connects to in the HNSW graph. Default is 16.
(Optional) Sets ef_construction to 90 to control how many candidate neighbors are considered during graph construction. Default is 100.

Default `bfloat16` element type

For inference endpoints that produce float embeddings, semantic_text fields automatically use the bfloat16 element type. This reduces the storage required per vector dimension from 4 to 2 bytes with a negligible impact on search relevance for the vast majority of use cases.

The bfloat16 format uses a 2-byte floating-point encoding that maintains the same value range as 4-byte floats, but with reduced precision. You can check if your semantic_text field is using bfloat16 by default by inspecting the default index_options for the field using the get field mapping API:

				GET semantic-embeddings/_mapping/field/content?include_defaults

		{
  "semantic-embeddings": {
    "mappings": {
      "content": {
        "full_name": "content",
        "mapping": {
          "content": {
            "type": "semantic_text",
            "inference_id": "my-float-embedding-endpoint",
            "index_options": {
              "dense_vector": {
                "element_type": "bfloat16"
              }
            }
          }
        }
      }
    }
  }
}
		
	

Indicates that the semantic_text field defaulted to the bfloat16 element type.

Override the default element type

If your use case requires full float precision, you can override the default bfloat16 element type by specifying element_type in index_options.dense_vector:

						PUT semantic-embeddings
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": "my-float-embedding-endpoint",
        "index_options": {
          "dense_vector": {
            "element_type": "float"
          }
        }
      }
    }
  }
}
		
	

Overrides the default bfloat16 element type to use full-precision float (4 bytes per dimension).

You can also combine element_type with other index options:

						PUT semantic-embeddings
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": "my-float-embedding-endpoint",
        "index_options": {
          "dense_vector": {
            "element_type": "float",
            "type": "int8_hnsw"
          }
        }
      }
    }
  }
}
		
	

Note

The element_type override is only valid for inference endpoints that produce float embeddings. For float models, valid values are float and bfloat16. For other model element types (such as byte or bit), the element_type must match the model's native element type if specified.

Set up and configure semantic_text fields

Set up and configure `semantic_text` fields