Loading

Hybrid search with semantic_text

This tutorial walks you through hybrid search using the semantic_text field type together with a text field for lexical search. By the end, you will be able to:

  • Create an index mapping that supports storing both text content and vector embeddings for hybrid search
  • Ingest documents so the same text is embedded for semantic search and available for full-text search
  • Run hybrid queries using retrievers or ES|QL

In hybrid search, semantic retrieval scores by meaning while lexical search scores by textual similarity. Combining them often results in more robust rankings than either alone.

The recommended way to use hybrid search in the Elastic Stack follows the semantic_text workflow: you avoid hand-building inference ingest pipelines for embeddings while still keeping a dedicated text field for keyword-style matching.

Tip

To run the curl examples in this tutorial, set the following environment variables:

export ELASTICSEARCH_URL="your-elasticsearch-url"
export API_KEY="your-api-key"
		

To generate API keys, search for API keys in the global search bar. Learn more about finding your endpoint and credentials.

The destination index will contain both the embeddings for semantic search and the original text field for full-text search. This structure enables the combination of semantic search and full-text search.

You can run inference either using the Elastic Inference Service or on your own machine learning nodes.

Tip

For large-scale dense vector deployments, quantization strategies like BBQ can reduce memory usage. For details, refer to Optimizing vector storage.

In this example, you create an index for hybrid search using Elastic Inference Service. Embeddings are generated with the default inference model for the the semantic_text field type.

				PUT semantic-embeddings
					{
  "mappings": {
    "properties": {
      "content_embedding": {
        "type": "semantic_text"
      },
      "content": {
        "type": "text",
        "copy_to": "content_embedding"
      }
    }
  }
}
		
  1. The name of the field to contain the generated embeddings for semantic search.
  2. The field to contain the embeddings is a semantic_text field. Since no inference_id is provided, the default inference endpoint is used.
  3. The name of the field to contain the original text for lexical search.
  4. The textual data stored in the content field is copied to content_embedding and processed by the inference endpoint.
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "mappings": {
         "properties": {
           "content_embedding": {
             "type": "semantic_text"
           },
           "content": {
             "type": "text",
             "copy_to": "content_embedding"
           }
         }
       }
     }'
		
  1. The name of the field to contain the generated embeddings for semantic search.
  2. The field to contain the embeddings is a semantic_text field. Since no inference_id is provided, the default inference endpoint is used.
  3. The name of the field to contain the original text for lexical search.
  4. The textual data stored in the content field is copied to content_embedding and processed by the inference endpoint.
Important

For production environments, we recommend explicitly specifying the inference_id for semantic_text fields. Default endpoints can change across versions and deployment types, which may lead to to potential issues like mixed embedding models and inconsistent ranking results.

Below is an example of creating an index mapping using your own ML node with the .elser-2-elasticsearch inference endpoint.

				PUT semantic-embeddings
					{
  "mappings": {
    "properties": {
      "content_embedding": {
        "type": "semantic_text",
        "inference_id": ".elser-2-elasticsearch"
      },
      "content": {
        "type": "text",
        "copy_to": "content_embedding"
      }
    }
  }
}
		
  1. The name of the field to contain the generated embeddings for semantic search.
  2. The field to contain the embeddings is a semantic_text field.
  3. The .elser-2-elasticsearch preconfigured inference endpoint for the elasticsearch service is used.
  4. The name of the field to contain the original text for lexical search.
  5. The textual data stored in the content field is copied to content_embedding and processed by the inference endpoint.
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "mappings": {
         "properties": {
           "content_embedding": {
             "type": "semantic_text",
             "inference_id": ".elser-2-elasticsearch"
           },
           "content": {
             "type": "text",
             "copy_to": "content_embedding"
           }
         }
       }
     }'
		
  1. The name of the field to contain the generated embeddings for semantic search.
  2. The field to contain the embeddings is a semantic_text field.
  3. The .elser-2-elasticsearch preconfigured inference endpoint for the elasticsearch service is used.
  4. The name of the field to contain the original text for lexical search.
  5. The textual data stored in the content field is copied to content_embedding and processed by the inference endpoint.

With your index mapping in place, you can add some data. You only need to populate the content field. Elasticsearch stores its value as text for lexical search, and copy_to duplicates that same value into the content_embedding field. Because content_embedding is of type semantic_text, Elasticsearch then sends the value to the inference endpoint and stores the resulting embeddings.

Use the _bulk API to ingest the same sample documents:

				POST _bulk
					{ "index": { "_index": "semantic-embeddings", "_id": "1" } }
{ "content": "After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness." }
{ "index": { "_index": "semantic-embeddings", "_id": "2" } }
{ "content": "Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions." }
{ "index": { "_index": "semantic-embeddings", "_id": "3" } }
{ "content": "Tune cluster performance by monitoring thread pools and refresh interval." }
		
curl -X POST "${ELASTICSEARCH_URL}/_bulk" \
     -H "Content-Type: application/x-ndjson" \
     -H "Authorization: ApiKey ${API_KEY}" \
     --data-binary @- << 'EOF'
{ "index": { "_index": "semantic-embeddings", "_id": "1" } }
{ "content": "After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness." }
{ "index": { "_index": "semantic-embeddings", "_id": "2" } }
{ "content": "Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions." }
{ "index": { "_index": "semantic-embeddings", "_id": "3" } }
{ "content": "Tune cluster performance by monitoring thread pools and refresh interval." }
EOF
		

If you encounter errors, check that your index mapping and inference endpoint are configured correctly.

Now that you have data in your index, you can run hybrid search to combine lexical matches on content with vector search over content_embedding. You can choose between retrievers or ES|QL syntax.

Both the retriever and ES|QL approaches return hits ranked by a score that fuses lexical matches on content with semantic matches on content_embedding. Passages that match on both signals rank highest, followed by those that match on only one.

Note

For recommended ways to query and retrieve semantic_text data, refer to Search and retrieve semantic_text fields.

Retrievers provide a structured way to define and combine different search strategies, such as lexical and semantic search, within a single _search request. This example uses the RRF retriever, which merges two standard retrievers: one runs a lexical match on content, the other a match on content_embedding for semantic retrieval.

				GET semantic-embeddings/_search
					{
  "retriever": {
    "rrf": {
      "retrievers": [
        {
          "standard": {
            "query": {
              "match": {
                "content": "How to avoid muscle soreness while running?"
              }
            }
          }
        },
        {
          "standard": {
            "query": {
              "match": {
                "content_embedding": "How to avoid muscle soreness while running?"
              }
            }
          }
        }
      ]
    }
  }
}
		
  1. The first standard retriever represents the traditional lexical search.
  2. Lexical search is performed on the content field using the specified phrase.
  3. The second standard retriever runs a match query on content_embedding, which performs semantic retrieval for that field type.
  4. The same natural-language phrase is used as in the lexical branch. Elasticsearch scores content_embedding using semantic retrieval rather than term overlap alone.
curl -X GET "${ELASTICSEARCH_URL}/semantic-embeddings/_search" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "retriever": {
         "rrf": {
           "retrievers": [
             {
               "standard": {
                 "query": {
                   "match": {
                     "content": "How to avoid muscle soreness while running?"
                   }
                 }
               }
             },
             {
               "standard": {
                 "query": {
                  "match": {
                    "content_embedding": "How to avoid muscle soreness while running?"
                  }
                 }
               }
             }
           ]
         }
       }
     }'
		

ES|QL is a piped query language which supports both lexical and semantic search. This enables combining keyword matching, vector search, scoring, and result processing in a single query.

				POST /_query?format=txt
					{
  "query": """
    FROM semantic-embeddings METADATA _score
    | WHERE content: "muscle soreness running?" OR match(content_embedding, "How to avoid muscle soreness while running?", { "boost": 0.75 })
    | KEEP content, content_embedding
    | SORT _score DESC
    | LIMIT 1000
  """
}
		
  1. The METADATA _score clause returns the relevance score of each document.
  2. The match (:) operator matches keywords on content. match() runs semantic retrieval on content_embedding with boost 0.75.
  3. KEEP selects content and content_embedding columns for the text-formatted response.
  4. Sorts by descending score and limits to 1000 results.
curl -X POST "${ELASTICSEARCH_URL}/_query?format=txt" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "query": "FROM semantic-embeddings METADATA _score | WHERE content: \"muscle soreness running?\" OR match(content_embedding, \"How to avoid muscle soreness while running?\", { \"boost\": 0.75 }) | KEEP content, content_embedding | SORT _score DESC | LIMIT 1000"
     }'