Semantic text field type
The semantic_text field mapping can be added regardless of license state. However, it typically calls the Inference API, which requires an appropriate license. In these cases, using semantic_text in a cluster without the appropriate license causes operations such as indexing and reindexing to fail.
The semantic_text field type simplifies semantic search by providing sensible defaults that automate most of the manual work typically required for vector search. Using semantic_text, you don't have to manually configure mappings, set up ingestion pipelines, or handle chunking. The field type automatically:
- Configures index mappings: Chooses the correct field type (
sparse_vectorordense_vector), dimensions, similarity functions, and storage optimizations based on the inference endpoint. - Generates embeddings during indexing: Automatically generates embeddings when you index documents, without requiring ingestion pipelines or inference processors.
- Handles chunking: Automatically chunks long text documents during indexing.
The following example creates an index mapping with a semantic_text field, using default values:
PUT semantic-embeddings
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text"
}
}
}
}
If you don't specify an inference_id, like in the example above, and upgrade to a later version, newly created indices might use a different embedding model than existing ones. Queries that target these indices together can produce unexpected ranking results.
For details, refer to potential issues when mixing embedding models across indices.
The following example creates an index mapping with a semantic_text field that uses dense vectors:
PUT semantic-embeddings
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": "my-inference-endpoint",
"search_inference_id": "my-search-inference-endpoint",
"index_options": {
"dense_vector": {
"type": "bbq_disk"
}
},
"chunking_settings": {
"strategy": "word",
"max_chunk_size": 120,
"overlap": 40
}
}
}
}
}
- (Optional) Specifies the inference endpoint used to generate embeddings at index time. If you don’t specify an
inference_id, thesemantic_textfield uses a default inference endpoint. - (Optional) The inference endpoint used to generate embeddings at query time. If not specified, the endpoint defined by
inference_idis used at both index and query time. - (Optional) Configures how the underlying vector representation is indexed. In this example,
bbq_diskis selected for dense vectors. You can configure different index options depending on whether the field uses dense or sparse vectors. Learn how to setindex_optionsforsparse_vectorsand how to setindex_optionsfordense_vectors. - (Optional) Overrides the chunking settings from the inference endpoint. In this example, the
wordstrategy splits text on individual words with a maximum of 120 words per chunk and an overlap of 40 words between chunks. The default chunking strategy issentence.
For a complete example, refer to the Semantic search with semantic_text tutorial.
The semantic_text field type documentation is organized into reference content and how-to guides.
The Reference section provides technical reference content:
- Parameters: Parameter descriptions for
semantic_textfields. - Inference endpoints: Overview of inference endpoints used with
semantic_textfields. - Chunking: How
semantic_textautomatically processes long text passages by generating smaller chunks. - Pre-filtering for dense vector queries: Automatic pre-filtering behavior for dense vector queries on
semantic_textfields. - Limitations: Current limitations of
semantic_textfields. - Document count discrepancy: Understanding document counts in
_cat/indicesfor indices withsemantic_textfields. - Querying
semantic_textfields: Supported query types forsemantic_textfields.
The How-to guides section organizes procedure descriptions and examples into the following guides:
Set up and configure
semantic_textfields: Learn how to configure inference endpoints, including default and preconfigured options, ELSER on EIS, custom endpoints, and dedicated endpoints for ingestion and search operations.Ingest data with
semantic_textfields: Learn how to index pre-chunked content, usecopy_toand multi-fields to collect values from multiple fields, and perform updates and partial updates to optimize ingestion costs.Search and retrieve
semantic_textfields: Learn how to querysemantic_textfields, retrieve indexed chunks, return field embeddings, and highlight the most relevant fragments from search results.