Hybrid search with `semantic_text`

This tutorial walks you through hybrid search using the semantic_text field type together with a text field for lexical search. By the end, you will be able to:

Create an index mapping that supports storing both text content and vector embeddings for hybrid search
Ingest documents so the same text is embedded for semantic search and available for full-text search
Run hybrid queries using retrievers or ES|QL

In hybrid search, semantic retrieval scores by meaning while lexical search scores by textual similarity. Combining them often results in more robust rankings than either alone.

The recommended way to use hybrid search in the Elastic Stack follows the semantic_text workflow: you avoid hand-building inference ingest pipelines for embeddings while still keeping a dedicated text field for keyword-style matching.

Requirements

In this tutorial, we show code examples for using both Elastic Inference Service (EIS) and machine learning nodes. EIS is automatically enabled on Elastic Cloud Hosted deployments and Serverless projects. You can also use EIS for self-managed clusters.
To use the semantic_text field type with an inference service other than Elastic Inference Service, you must create an inference endpoint using the Create inference API.

Tip

To run the curl examples in this tutorial, set the following environment variables:

		export ELASTICSEARCH_URL="your-elasticsearch-url"
export API_KEY="your-api-key"

To generate API keys, search for API keys in the global search bar. Learn more about finding your endpoint and credentials.

Step 1: Create the index mapping

The destination index will contain both the embeddings for semantic search and the original text field for full-text search. This structure enables the combination of semantic search and full-text search.

You can run inference either using the Elastic Inference Service or on your own machine learning nodes.

Tip

For large-scale dense vector deployments, quantization strategies like BBQ can reduce memory usage. For details, refer to Optimizing vector storage.

Option 1: Use Elastic Inference Service (recommended)

In this example, you create an index for hybrid search using Elastic Inference Service. Embeddings are generated with the default inference model for the the semantic_text field type.

Console

						PUT semantic-embeddings
					{
  "mappings": {
    "properties": {
      "content_embedding": {
        "type": "semantic_text"
      },
      "content": {
        "type": "text",
        "copy_to": "content_embedding"
      }
    }
  }
}
		
	

The name of the field to contain the generated embeddings for semantic search.
The field to contain the embeddings is a semantic_text field. Since no inference_id is provided, the default inference endpoint is used.
The name of the field to contain the original text for lexical search.
The textual data stored in the content field is copied to content_embedding and processed by the inference endpoint.

curl

		curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "mappings": {
         "properties": {
           "content_embedding": {
             "type": "semantic_text"
           },
           "content": {
             "type": "text",
             "copy_to": "content_embedding"
           }
         }
       }
     }'
		
	

The name of the field to contain the generated embeddings for semantic search.
The field to contain the embeddings is a semantic_text field. Since no inference_id is provided, the default inference endpoint is used.
The name of the field to contain the original text for lexical search.
The textual data stored in the content field is copied to content_embedding and processed by the inference endpoint.

Important

For production environments, we recommend explicitly specifying the inference_id for semantic_text fields. Default endpoints can change across versions and deployment types, which may lead to to potential issues like mixed embedding models and inconsistent ranking results.

Option 2: Use machine learning nodes

Below is an example of creating an index mapping using your own ML node with the .elser-2-elasticsearch inference endpoint.

Console

						PUT semantic-embeddings
					{
  "mappings": {
    "properties": {
      "content_embedding": {
        "type": "semantic_text",
        "inference_id": ".elser-2-elasticsearch"
      },
      "content": {
        "type": "text",
        "copy_to": "content_embedding"
      }
    }
  }
}
		
	

The name of the field to contain the generated embeddings for semantic search.
The field to contain the embeddings is a semantic_text field.
The .elser-2-elasticsearch preconfigured inference endpoint for the elasticsearch service is used.
The name of the field to contain the original text for lexical search.
The textual data stored in the content field is copied to content_embedding and processed by the inference endpoint.

curl

		curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "mappings": {
         "properties": {
           "content_embedding": {
             "type": "semantic_text",
             "inference_id": ".elser-2-elasticsearch"
           },
           "content": {
             "type": "text",
             "copy_to": "content_embedding"
           }
         }
       }
     }'
		
	

The name of the field to contain the generated embeddings for semantic search.
The field to contain the embeddings is a semantic_text field.
The .elser-2-elasticsearch preconfigured inference endpoint for the elasticsearch service is used.
The name of the field to contain the original text for lexical search.
The textual data stored in the content field is copied to content_embedding and processed by the inference endpoint.

Step 2: Ingest data

With your index mapping in place, you can add some data. You only need to populate the content field. Elasticsearch stores its value as text for lexical search, and copy_to duplicates that same value into the content_embedding field. Because content_embedding is of type semantic_text, Elasticsearch then sends the value to the inference endpoint and stores the resulting embeddings.

Use the _bulk API to ingest the same sample documents:

Console

						POST _bulk
					{ "index": { "_index": "semantic-embeddings", "_id": "1" } }
{ "content": "After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness." }
{ "index": { "_index": "semantic-embeddings", "_id": "2" } }
{ "content": "Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions." }
{ "index": { "_index": "semantic-embeddings", "_id": "3" } }
{ "content": "Tune cluster performance by monitoring thread pools and refresh interval." }
		
	

curl

		curl -X POST "${ELASTICSEARCH_URL}/_bulk" \
     -H "Content-Type: application/x-ndjson" \
     -H "Authorization: ApiKey ${API_KEY}" \
     --data-binary @- << 'EOF'
{ "index": { "_index": "semantic-embeddings", "_id": "1" } }
{ "content": "After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness." }
{ "index": { "_index": "semantic-embeddings", "_id": "2" } }
{ "content": "Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions." }
{ "index": { "_index": "semantic-embeddings", "_id": "3" } }
{ "content": "Tune cluster performance by monitoring thread pools and refresh interval." }
EOF
		
	

						
					{
  "errors": false, 
  "took": 400,
  "items": [
    {
      "index": {
        "_index": "semantic-embeddings",
        "_id": "1",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "semantic-embeddings",
        "_id": "2",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 1,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "semantic-embeddings",
        "_id": "3",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 2,
        "_primary_term": 1,
        "status": 201
      }
    }
  ]
}
		
	

Each document is indexed with content for search. The same text is copied to content_embedding and embedded through the configured inference endpoint.

If you encounter errors, check that your index mapping and inference endpoint are configured correctly.

Step 3: Run a hybrid search query

Now that you have data in your index, you can run hybrid search to combine lexical matches on content with vector search over content_embedding. You can choose between retrievers or ES|QL syntax.

Both the retriever and ES|QL approaches return hits ranked by a score that fuses lexical matches on content with semantic matches on content_embedding. Passages that match on both signals rank highest, followed by those that match on only one.

Note

For recommended ways to query and retrieve semantic_text data, refer to Search and retrieve semantic_text fields.

Use retrievers

Retrievers provide a structured way to define and combine different search strategies, such as lexical and semantic search, within a single _search request. This example uses the RRF retriever, which merges two standard retrievers: one runs a lexical match on content, the other a match on content_embedding for semantic retrieval.

Console

						GET semantic-embeddings/_search
					{
  "retriever": {
    "rrf": {
      "retrievers": [
        {
          "standard": {
            "query": {
              "match": {
                "content": "How to avoid muscle soreness while running?"
              }
            }
          }
        },
        {
          "standard": {
            "query": {
              "match": {
                "content_embedding": "How to avoid muscle soreness while running?"
              }
            }
          }
        }
      ]
    }
  }
}
		
	

The first standard retriever represents the traditional lexical search.
Lexical search is performed on the content field using the specified phrase.
The second standard retriever runs a match query on content_embedding, which performs semantic retrieval for that field type.
The same natural-language phrase is used as in the lexical branch. Elasticsearch scores content_embedding using semantic retrieval rather than term overlap alone.

curl

		curl -X GET "${ELASTICSEARCH_URL}/semantic-embeddings/_search" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "retriever": {
         "rrf": {
           "retrievers": [
             {
               "standard": {
                 "query": {
                   "match": {
                     "content": "How to avoid muscle soreness while running?"
                   }
                 }
               }
             },
             {
               "standard": {
                 "query": {
                  "match": {
                    "content_embedding": "How to avoid muscle soreness while running?"
                  }
                 }
               }
             }
           ]
         }
       }
     }'
		
	

						
					{
  "took": 176,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 0.032786883,
    "hits": [
      {
        "_index": "semantic-embeddings",
        "_id": "akiYKZ0BGwHk8ONXXqmi",
        "_score": 0.032786883,
        "_source": {
          "content": "After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness."
        }
      },
      {
        "_index": "semantic-embeddings",
        "_id": "a0iYKZ0BGwHk8ONXXqmi",
        "_score": 0.016129032,
        "_source": {
          "content": "Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions."
        }
      },
      {
        "_index": "semantic-embeddings",
        "_id": "bEiYKZ0BGwHk8ONXXqmi",
        "_score": 0.015873017,
        "_source": {
          "content": "Tune cluster performance by monitoring thread pools and refresh interval."
        }
      }
    ]
  }
}
		
	

The returned hits show fused _score rankings after RRF over lexical content and semantic content_embedding retrieval.

Use ES|QL

ES|QL is a piped query language which supports both lexical and semantic search. This enables combining keyword matching, vector search, scoring, and result processing in a single query.

Console

						POST /_query?format=txt
					{
  "query": """
    FROM semantic-embeddings METADATA _score
    | WHERE content: "muscle soreness running?" OR match(content_embedding, "How to avoid muscle soreness while running?", { "boost": 0.75 })
    | KEEP content, content_embedding
    | SORT _score DESC
    | LIMIT 1000
  """
}
		
	

The METADATA _score clause returns the relevance score of each document.
The match (:) operator matches keywords on content. match() runs semantic retrieval on content_embedding with boost 0.75.
KEEP selects content and content_embedding columns for the text-formatted response.
Sorts by descending score and limits to 1000 results.

curl

		curl -X POST "${ELASTICSEARCH_URL}/_query?format=txt" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "query": "FROM semantic-embeddings METADATA _score | WHERE content: \"muscle soreness running?\" OR match(content_embedding, \"How to avoid muscle soreness while running?\", { \"boost\": 0.75 }) | KEEP content, content_embedding | SORT _score DESC | LIMIT 1000"
     }'
		
	

		                                                     content                                                     |                                                content_embedding                                                |      _score
-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+-------------------
After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness.|After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness.|21.63957405090332
Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions.|Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions.|8.419901847839355
Tune cluster performance by monitoring thread pools and refresh interval.                                        |Tune cluster performance by monitoring thread pools and refresh interval.                                        |0.22893255949020386
		
	

Rows are sorted by _score descending after combining the content keyword match and boosted content_embedding match.

For recommended ways to query and retrieve semantic_text data, refer to Search and retrieve semantic_text fields.
For a notebook-style walkthrough of semantic_text in hybrid search, see this notebook.
To set up semantic-only search on the same sample data model, follow the Semantic search with semantic_text tutorial.
To learn how to optimize storage and search performance when using dense vector embeddings, refer to Optimizing vector storage.

Hybrid search with semantic_text

Hybrid search with `semantic_text`