﻿---
title: Optimize vector storage for semantic search
description: Reduce the memory footprint of dense vector embeddings in semantic search by configuring quantization strategies on semantic_text fields.
url: https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/5668/solutions/search/semantic-search/vector-storage-for-semantic-search
products:
  - Elasticsearch
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# Optimize vector storage for semantic search
When scaling semantic search, the memory footprint of dense vector embeddings can become a primary concern. You can optimize storage and search performance for your `semantic_text` indexes by configuring the `index_options` parameter on the underlying `dense_vector` field. The `index_options` parameter controls how vectors are indexed and stored. You can specify [quantization strategies](https://www.elastic.co/blog/vector-search-elasticsearch-rationale) like [Better Binary Quantization (BBQ)](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/mapping-reference/bbq) that compress high-dimensional vectors into more efficient representations, achieving up to 32x memory reduction while maintaining search quality.

## Before you begin

- You need a `semantic_text` field that uses an inference endpoint producing **dense vector embeddings** (such as E5, OpenAI embeddings, or Cohere).
- If you use a custom model, create the inference endpoint first using the [Create inference API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).

<note>
  These `index_options` do not apply to sparse vector models like ELSER, which use a different internal representation. For details on all available options and their trade-offs, refer to the [`dense_vector` `index_options` documentation](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/mapping-reference/dense-vector#dense-vector-index-options).
</note>


## Choose a quantization strategy

Select a quantization strategy based on your dataset size and performance requirements:

| Strategy                                                                         | Memory reduction | Best for                                                | Trade-offs                          |
|----------------------------------------------------------------------------------|------------------|---------------------------------------------------------|-------------------------------------|
| `bbq_hnsw`                                                                       | Up to 32x        | Most production use cases (default for 384+ dimensions) | Minimal accuracy loss               |
| `bbq_flat`                                                                       | Up to 32x        | Smaller datasets needing maximum accuracy               | Slower queries (brute-force search) |
| `bbq_disk` <applies-to>Elastic Stack: Generally available since 9.2</applies-to> | Up to 32x        | Large datasets with constrained RAM                     | Slower queries (disk-based)         |
| `int8_hnsw`                                                                      | 4x               | High accuracy retention                                 | Lower compression than BBQ          |
| `int4_hnsw`                                                                      | 8x               | Balance between compression and accuracy                | Some accuracy loss                  |

For most use cases with dense vector embeddings from text models, we recommend [Better Binary Quantization (BBQ)](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/mapping-reference/bbq). BBQ requires a minimum of 64 dimensions and works best with text embeddings.

## Configure your index mapping

Create an index with a `semantic_text` field and set the `index_options` to your chosen quantization strategy.
<tab-set>
  <tab-item title="BBQ with HNSW">
    ```json

    {
      "mappings": {
        "properties": {
          "content": {
            "type": "semantic_text",
            "inference_id": ".multilingual-e5-small-elasticsearch", <1>
            "index_options": {
              "dense_vector": {
                "type": "bbq_hnsw" <2>
              }
            }
          }
        }
      }
    }
    ```
  </tab-item>

  <tab-item title="BBQ flat">
    Use `bbq_flat` for smaller datasets where you need maximum accuracy at the expense of speed:
    ```json

    {
      "mappings": {
        "properties": {
          "content": {
            "type": "semantic_text",
            "inference_id": ".multilingual-e5-small-elasticsearch",
            "index_options": {
              "dense_vector": {
                "type": "bbq_flat" <1>
              }
            }
          }
        }
      }
    }
    ```
  </tab-item>

  <tab-item title="DiskBBQ">
    <applies-to>
      - Elastic Cloud Serverless: Unavailable
      - Elastic Stack: Generally available since 9.2
    </applies-to>
    For large datasets where RAM is constrained, use `bbq_disk` (DiskBBQ) to minimize memory usage:
    ```json

    {
      "mappings": {
        "properties": {
          "content": {
            "type": "semantic_text",
            "inference_id": ".multilingual-e5-small-elasticsearch",
            "index_options": {
              "dense_vector": {
                "type": "bbq_disk" <1>
              }
            }
          }
        }
      }
    }
    ```
  </tab-item>

  <tab-item title="Integer quantization">
    ```json

    {
      "mappings": {
        "properties": {
          "content": {
            "type": "semantic_text",
            "inference_id": ".multilingual-e5-small-elasticsearch",
            "index_options": {
              "dense_vector": {
                "type": "int8_hnsw" <1>
              }
            }
          }
        }
      }
    }
    ```
  </tab-item>
</tab-set>


## Verify your configuration

Confirm that the `index_options` are applied to your index:
```json
```

The response includes the `index_options` you configured under the `content` field's mapping. If the `index_options` block is missing, check that you specified it correctly in the `PUT` request.

## (Optional) Tune HNSW parameters

For HNSW-specific tuning parameters like `m` and `ef_construction`, you can include them in the `index_options`:
```json

{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_hnsw",
            "m": 32, <1>
            "ef_construction": 200 <2>
          }
        }
      }
    }
  }
}
```


## Next steps

- Follow the [Semantic search with `semantic_text`](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/5668/solutions/search/semantic-search/semantic-search-semantic-text) tutorial to set up an end-to-end semantic search workflow.
- Combine semantic search with keyword search using [hybrid search](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/5668/solutions/search/hybrid-semantic-text).


## Related pages

- [`dense_vector` `index_options` reference](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/mapping-reference/dense-vector#dense-vector-index-options)
- [Better Binary Quantization (BBQ)](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/mapping-reference/bbq)
- [Dense vector search](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/5668/solutions/search/vector/dense-vector)
- [Trained model autoscaling](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/5668/deploy-manage/autoscaling/trained-model-autoscaling)