﻿---
title: Semantic search with semantic_text
description: This tutorial shows you how to use the semantic text feature to perform semantic search on your data. Semantic text simplifies the inference workflow...
url: https://www.elastic.co/elastic/docs-builder/docs/3016/solutions/search/semantic-search/semantic-search-semantic-text
products:
  - Elasticsearch
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# Semantic search with semantic_text
This tutorial shows you how to use the semantic text feature to perform semantic search on your data.
Semantic text simplifies the inference workflow by providing inference at ingestion time and sensible default values automatically. You don’t need to define model related settings and parameters, or create inference ingest pipelines.
The recommended way to use [semantic search](https://www.elastic.co/elastic/docs-builder/docs/3016/solutions/search/semantic-search) in the Elastic Stack is following the `semantic_text` workflow. When you need more control over indexing and query settings, you can still use the complete inference workflow (refer to [this tutorial](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/elastic-inference/inference-api) to review the process).
This tutorial uses the [Elastic Inference Service (EIS)](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/elastic-inference/eis), but you can use any service and model supported by the [Inference API](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/elastic-inference/inference-api).

## Requirements

- This tutorial uses the [Elastic Inference Service (EIS)](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/elastic-inference/eis), which is automatically enabled on Elastic Cloud Hosted deployments and Serverless projects.

<note>
  You can also use [EIS for self-managed clusters](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/elastic-inference/connect-self-managed-cluster-to-eis).
</note>

- To use the `semantic_text` field type with an inference service other than Elastic Inference Service, you must create an inference endpoint using the [Create inference API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).


## Create the index mapping

The mapping of the destination index - the index that contains the embeddings that the inference endpoint will generate based on your input text - must be created. The destination index must have a field with the [`semantic_text`](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/elasticsearch/mapping-reference/semantic-text) field type to index the output of the used inference endpoint.
You can run inference either using the [Elastic Inference Service](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/elastic-inference/eis) or on your own ML-nodes. The following examples show you both scenarios.
<tab-set>
  <tab-item title="Using EIS">
    ```json

    {
      "mappings": {
        "properties": {
          "content": { <1>
            "type": "semantic_text" <2>
          }
        }
      }
    }
    ```
  </tab-item>

  <tab-item title="Using ML-nodes">
    ```json

    {
      "mappings": {
        "properties": {
          "content": { <1>
            "type": "semantic_text", <2>
            "inference_id": ".elser-2-elasticsearch" <3>
          }
        }
      }
    }
    ```
  </tab-item>
</tab-set>


### Optimizing vector storage with `index_options`

When using `semantic_text` with dense vector embeddings (such as E5 or other text embedding models), you can optimize storage and search performance by configuring `index_options` on the underlying `dense_vector` field. This is particularly useful for large-scale deployments. The `index_options` parameter is only applicable when using inference endpoints that produce dense vector embeddings (like E5, OpenAI embeddings, Cohere embeddings, and others). It does not apply to sparse vector models like ELSER, which use a different internal representation.
The `index_options` parameter controls how vectors are indexed and stored. For dense vector embeddings, you can specify [quantization strategies](https://www.elastic.co/blog/vector-search-elasticsearch-rationale) like [Better Binary Quantization (BBQ)](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/elasticsearch/mapping-reference/bbq) that significantly reduce memory footprint while maintaining search quality. Quantization compresses high-dimensional vectors into more efficient representations, enabling faster searches and reduced memory consumption. For details on available options and their trade-offs, refer to the [`dense_vector` `index_options` documentation](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/elasticsearch/mapping-reference/dense-vector#dense-vector-index-options).

#### Choose a quantization strategy

For most production use cases using `semantic_text` with dense vector embeddings from text models (like E5, OpenAI, or Cohere), BBQ is recommended as it provides up to 32x memory reduction with minimal accuracy loss. BBQ requires a minimum of 64 dimensions and works best with text embeddings (it might not perform well with other types like image embeddings). Choose from:
- `bbq_hnsw` - Best for most use cases (default for 384+ dimensions)
- `bbq_flat` - BBQ without HNSW for smaller datasets
- `bbq_disk` - Disk-based storage for large datasets with minimal memory requirements <applies-to>Elastic Stack: Generally available since 9.2</applies-to>


#### Use BBQ with HNSW

Here's an example using `semantic_text` with a text embedding inference endpoint and BBQ quantization:
```json

{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch", <1>
        "index_options": {
          "dense_vector": {
            "type": "bbq_hnsw" <2>
          }
        }
      }
    }
  }
}
```


#### Use BBQ without HNSW

You can also use `bbq_flat` for smaller datasets where you need maximum accuracy at the expense of speed:
```json

{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_flat" <1>
          }
        }
      }
    }
  }
}
```


#### Use DiskBBQ for large datasets

<applies-to>
  - Elastic Cloud Serverless: Unavailable
  - Elastic Stack: Generally available since 9.2
</applies-to>

For large datasets where RAM is constrained, use `bbq_disk` (DiskBBQ) to minimize memory usage:
```json

{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_disk" <1>
          }
        }
      }
    }
  }
}
```


#### Use integer quantization

Other quantization options include `int8_hnsw` (8-bit integer quantization) and `int4_hnsw` (4-bit integer quantization):
```json

{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "int8_hnsw" <1>
          }
        }
      }
    }
  }
}
```


#### Tune HNSW parameters

For HNSW-specific tuning parameters like `m` and `ef_construction`, you can include them in the `index_options`:
```json

{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_hnsw",
            "m": 32, <1>
            "ef_construction": 200 <2>
          }
        }
      }
    }
  }
}
```

<note>
  If you're using web crawlers or connectors to generate indices, you have to [update the index mappings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) for these indices to include the `semantic_text` field. Once the mapping is updated, you'll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data.
</note>


## Load data

In this step, you load the data that you later use to create embeddings from it.
Use the `msmarco-passagetest2019-top1000` data set, which is a subset of the MS MARCO Passage Ranking data set. It consists of 200 queries, each accompanied by a list of relevant text passages. All unique passages, along with their IDs, have been extracted from that data set and compiled into a [tsv file](https://github.com/elastic/stack-docs/blob/main/docs/en/stack/ml/nlp/data/msmarco-passagetest2019-unique.tsv).
Download the file and upload it to your cluster using the [Data Visualizer](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/ingest/upload-data-files) in the Machine Learning UI. After your data is analyzed, click **Override settings**. Under **Edit field names**, assign `id` to the first column and `content` to the second. Click **Apply**, then **Import**. Name the index `test-data`, and click **Import**. After the upload is complete, you will see an index named `test-data` with 182,469 documents.

## Reindex the data

Create the embeddings from the text by reindexing the data from the `test-data` index to the `semantic-embeddings` index. The data in the `content` field will be reindexed into the `content` semantic text field of the destination index. The reindexed data will be processed by the inference endpoint associated with the `content` semantic text field.
<note>
  This step uses the reindex API to simulate data ingestion. If you are working with data that has already been indexed, rather than using the test-data set, reindexing is required to ensure that the data is processed by the inference endpoint and the necessary embeddings are generated.
</note>

```json

{
  "source": {
    "index": "test-data",
    "size": 10 <1>
  },
  "dest": {
    "index": "semantic-embeddings"
  }
}
```

The call returns a task ID to monitor the progress:
```json
```

Reindexing large datasets can take a long time. You can test this workflow using only a subset of the dataset. Do this by cancelling the reindexing process, and only generating embeddings for the subset that was reindexed. The following API request will cancel the reindexing task:
```json
```


## Semantic search

After the data has been indexed with the embeddings, you can query the data using semantic search. Choose between [Query DSL](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/query-filter/languages/querydsl) or [ES|QL](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/query-languages/esql) syntax to execute the query.
<tab-set>
  <tab-item title="Query DSL">
    The Query DSL approach uses the [`match` query](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/query-languages/query-dsl/query-dsl-match-query) type with the `semantic_text` field:
    ```esql
    GET semantic-embeddings/_search
    {
      "query": {
        "match": {
          "content": { 
            "query": "What causes muscle soreness after running?" 
          }
        }
      }
    }
    ```
  </tab-item>

  <tab-item title="ES|QL">
    The ES|QL approach uses the [match (`:`) operator](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/query-languages/esql/functions-operators/operators#esql-match-operator), which automatically detects the `semantic_text` field and performs the search on it. The query uses `METADATA _score` to sort by `_score` in descending order.
    ```json

    {
      "query": """
        FROM semantic-embeddings METADATA _score <1>
        | WHERE content: "How to avoid muscle soreness while running?" <2>
        | SORT _score DESC <3>
        | LIMIT 1000 <4>
      """
    }
    ```
  </tab-item>
</tab-set>


## Further examples and reading

- For an overview of all query types supported by `semantic_text` fields and guidance on when to use them, see [Querying `semantic_text` fields](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/elasticsearch/mapping-reference/semantic-text-search-retrieval).
- If you want to use `semantic_text` in hybrid search, refer to [this notebook](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/search/09-semantic-text.ipynb) for a step-by-step guide.
- For more information on how to optimize your ELSER endpoints, refer to [the ELSER recommendations](/elastic/docs-builder/docs/3016/explore-analyze/machine-learning/nlp/ml-nlp-elser#elser-recommendations) section in the model documentation.
- To learn more about model autoscaling, refer to the [trained model autoscaling](https://www.elastic.co/elastic/docs-builder/docs/3016/deploy-manage/autoscaling/trained-model-autoscaling) page.