Hybrid search with semantic_text
This tutorial walks you through hybrid search using the semantic_text field type together with a text field for lexical search. By the end, you will be able to:
- Create an index mapping that supports storing both text content and vector embeddings for hybrid search
- Ingest documents so the same text is embedded for semantic search and available for full-text search
- Run hybrid queries using retrievers or ES|QL
In hybrid search, semantic retrieval scores by meaning while lexical search scores by textual similarity. Combining them often results in more robust rankings than either alone.
The recommended way to use hybrid search in the Elastic Stack follows the semantic_text workflow: you avoid hand-building inference ingest pipelines for embeddings while still keeping a dedicated text field for keyword-style matching.
In this tutorial, we show code examples for using both Elastic Inference Service (EIS) and machine learning nodes. EIS is automatically enabled on Elastic Cloud Hosted deployments and Serverless projects. You can also use EIS for self-managed clusters.
To use the
semantic_textfield type with an inference service other than Elastic Inference Service, you must create an inference endpoint using the Create inference API.
To run the curl examples in this tutorial, set the following environment variables:
export ELASTICSEARCH_URL="your-elasticsearch-url"
export API_KEY="your-api-key"
To generate API keys, search for API keys in the global search bar. Learn more about finding your endpoint and credentials.
The destination index will contain both the embeddings for semantic search and the original text field for full-text search. This structure enables the combination of semantic search and full-text search.
You can run inference either using the Elastic Inference Service or on your own machine learning nodes.
For large-scale dense vector deployments, quantization strategies like BBQ can reduce memory usage. For details, refer to Optimizing vector storage.
In this example, you create an index for hybrid search using Elastic Inference Service. Embeddings are generated with the default inference model for the the semantic_text field type.
PUT semantic-embeddings
{
"mappings": {
"properties": {
"content_embedding": {
"type": "semantic_text"
},
"content": {
"type": "text",
"copy_to": "content_embedding"
}
}
}
}
- The name of the field to contain the generated embeddings for semantic search.
- The field to contain the embeddings is a
semantic_textfield. Since noinference_idis provided, the default inference endpoint is used. - The name of the field to contain the original text for lexical search.
- The textual data stored in the
contentfield is copied tocontent_embeddingand processed by the inference endpoint.
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey ${API_KEY}" \
-d '{
"mappings": {
"properties": {
"content_embedding": {
"type": "semantic_text"
},
"content": {
"type": "text",
"copy_to": "content_embedding"
}
}
}
}'
- The name of the field to contain the generated embeddings for semantic search.
- The field to contain the embeddings is a
semantic_textfield. Since noinference_idis provided, the default inference endpoint is used. - The name of the field to contain the original text for lexical search.
- The textual data stored in the
contentfield is copied tocontent_embeddingand processed by the inference endpoint.
For production environments, we recommend explicitly specifying the inference_id for semantic_text fields. Default endpoints can change across versions and deployment types, which may lead to to potential issues like mixed embedding models and inconsistent ranking results.
Below is an example of creating an index mapping using your own ML node with the .elser-2-elasticsearch inference endpoint.
PUT semantic-embeddings
{
"mappings": {
"properties": {
"content_embedding": {
"type": "semantic_text",
"inference_id": ".elser-2-elasticsearch"
},
"content": {
"type": "text",
"copy_to": "content_embedding"
}
}
}
}
- The name of the field to contain the generated embeddings for semantic search.
- The field to contain the embeddings is a
semantic_textfield. - The
.elser-2-elasticsearchpreconfigured inference endpoint for theelasticsearchservice is used. - The name of the field to contain the original text for lexical search.
- The textual data stored in the
contentfield is copied tocontent_embeddingand processed by the inference endpoint.
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey ${API_KEY}" \
-d '{
"mappings": {
"properties": {
"content_embedding": {
"type": "semantic_text",
"inference_id": ".elser-2-elasticsearch"
},
"content": {
"type": "text",
"copy_to": "content_embedding"
}
}
}
}'
- The name of the field to contain the generated embeddings for semantic search.
- The field to contain the embeddings is a
semantic_textfield. - The
.elser-2-elasticsearchpreconfigured inference endpoint for theelasticsearchservice is used. - The name of the field to contain the original text for lexical search.
- The textual data stored in the
contentfield is copied tocontent_embeddingand processed by the inference endpoint.
Example response
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "semantic-embeddings"
}
With your index mapping in place, you can add some data. You only need to populate the content field. Elasticsearch stores its value as text for lexical search, and copy_to duplicates that same value into the content_embedding field. Because content_embedding is of type semantic_text, Elasticsearch then sends the value to the inference endpoint and stores the resulting embeddings.
Use the _bulk API to ingest the same sample documents:
POST _bulk
{ "index": { "_index": "semantic-embeddings", "_id": "1" } }
{ "content": "After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness." }
{ "index": { "_index": "semantic-embeddings", "_id": "2" } }
{ "content": "Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions." }
{ "index": { "_index": "semantic-embeddings", "_id": "3" } }
{ "content": "Tune cluster performance by monitoring thread pools and refresh interval." }
curl -X POST "${ELASTICSEARCH_URL}/_bulk" \
-H "Content-Type: application/x-ndjson" \
-H "Authorization: ApiKey ${API_KEY}" \
--data-binary @- << 'EOF'
{ "index": { "_index": "semantic-embeddings", "_id": "1" } }
{ "content": "After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness." }
{ "index": { "_index": "semantic-embeddings", "_id": "2" } }
{ "content": "Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions." }
{ "index": { "_index": "semantic-embeddings", "_id": "3" } }
{ "content": "Tune cluster performance by monitoring thread pools and refresh interval." }
EOF
Example response
{
"errors": false,
"took": 400,
"items": [
{
"index": {
"_index": "semantic-embeddings",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "semantic-embeddings",
"_id": "2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "semantic-embeddings",
"_id": "3",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 2,
"_primary_term": 1,
"status": 201
}
}
]
}
Each document is indexed with content for search. The same text is copied to content_embedding and embedded through the configured inference endpoint.
If you encounter errors, check that your index mapping and inference endpoint are configured correctly.
Now that you have data in your index, you can run hybrid search to combine lexical matches on content with vector search over content_embedding. You can choose between retrievers or ES|QL syntax.
Both the retriever and ES|QL approaches return hits ranked by a score that fuses lexical matches on content with semantic matches on content_embedding. Passages that match on both signals rank highest, followed by those that match on only one.
For recommended ways to query and retrieve semantic_text data, refer to Search and retrieve semantic_text fields.
Retrievers provide a structured way to define and combine different search strategies, such as lexical and semantic search, within a single _search request. This example uses the RRF retriever, which merges two standard retrievers: one runs a lexical match on content, the other a match on content_embedding for semantic retrieval.
GET semantic-embeddings/_search
{
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"match": {
"content": "How to avoid muscle soreness while running?"
}
}
}
},
{
"standard": {
"query": {
"match": {
"content_embedding": "How to avoid muscle soreness while running?"
}
}
}
}
]
}
}
}
- The first
standardretriever represents the traditional lexical search. - Lexical search is performed on the
contentfield using the specified phrase. - The second
standardretriever runs amatchquery oncontent_embedding, which performs semantic retrieval for that field type. - The same natural-language phrase is used as in the lexical branch. Elasticsearch scores
content_embeddingusing semantic retrieval rather than term overlap alone.
curl -X GET "${ELASTICSEARCH_URL}/semantic-embeddings/_search" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey ${API_KEY}" \
-d '{
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"match": {
"content": "How to avoid muscle soreness while running?"
}
}
}
},
{
"standard": {
"query": {
"match": {
"content_embedding": "How to avoid muscle soreness while running?"
}
}
}
}
]
}
}
}'
Example response
{
"took": 176,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 0.032786883,
"hits": [
{
"_index": "semantic-embeddings",
"_id": "akiYKZ0BGwHk8ONXXqmi",
"_score": 0.032786883,
"_source": {
"content": "After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness."
}
},
{
"_index": "semantic-embeddings",
"_id": "a0iYKZ0BGwHk8ONXXqmi",
"_score": 0.016129032,
"_source": {
"content": "Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions."
}
},
{
"_index": "semantic-embeddings",
"_id": "bEiYKZ0BGwHk8ONXXqmi",
"_score": 0.015873017,
"_source": {
"content": "Tune cluster performance by monitoring thread pools and refresh interval."
}
}
]
}
}
The returned hits show fused _score rankings after RRF over lexical content and semantic content_embedding retrieval.
ES|QL is a piped query language which supports both lexical and semantic search. This enables combining keyword matching, vector search, scoring, and result processing in a single query.
POST /_query?format=txt
{
"query": """
FROM semantic-embeddings METADATA _score
| WHERE content: "muscle soreness running?" OR match(content_embedding, "How to avoid muscle soreness while running?", { "boost": 0.75 })
| KEEP content, content_embedding
| SORT _score DESC
| LIMIT 1000
"""
}
- The
METADATA _scoreclause returns the relevance score of each document. - The match (
:) operator matches keywords oncontent.match()runs semantic retrieval oncontent_embeddingwith boost0.75. KEEPselectscontentandcontent_embeddingcolumns for the text-formatted response.- Sorts by descending score and limits to 1000 results.
curl -X POST "${ELASTICSEARCH_URL}/_query?format=txt" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey ${API_KEY}" \
-d '{
"query": "FROM semantic-embeddings METADATA _score | WHERE content: \"muscle soreness running?\" OR match(content_embedding, \"How to avoid muscle soreness while running?\", { \"boost\": 0.75 }) | KEEP content, content_embedding | SORT _score DESC | LIMIT 1000"
}'
Example response
content | content_embedding | _score
-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+-------------------
After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness.|After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness.|21.63957405090332
Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions.|Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions.|8.419901847839355
Tune cluster performance by monitoring thread pools and refresh interval. |Tune cluster performance by monitoring thread pools and refresh interval. |0.22893255949020386
Rows are sorted by _score descending after combining the content keyword match and boosted content_embedding match.
- For recommended ways to query and retrieve
semantic_textdata, refer to Search and retrievesemantic_textfields. - For a notebook-style walkthrough of
semantic_textin hybrid search, see this notebook. - To set up semantic-only search on the same sample data model, follow the Semantic search with
semantic_texttutorial. - To learn how to optimize storage and search performance when using dense vector embeddings, refer to Optimizing vector storage.