Vector search in Elasticsearch

Tip

New to semantic search? Start with the semantic search quickstart, which uses the managed semantic_text workflow.

For common vector search use cases and how to apply them, refer to Vector search use cases.

Vector search stores embeddings and retrieves the most similar vectors to a query. Elasticsearch functions as a vector database: it scales embedding storage and similarity search while combining that with full-text search, filters, and aggregations in one engine. This page explains the core concepts and terminology you need before working with vector search in Elasticsearch.

Core concepts

Vector database

A system designed to store vector embeddings at scale and retrieve the most similar vectors to a query embedding, typically using approximate or exact k-nearest neighbor (kNN) search. Elasticsearch functions as a vector database when you store embeddings in dense_vector or sparse_vector fields and query them for similarity. Elasticsearch can combine vector search with full-text search, structured filters, aggregations, and hybrid retrieval in one engine, so you can keep lexical, semantic, and operational queries on the same data and infrastructure.

Vector embedding

An ordered list of numbers that represents data in a multi-dimensional space. Each number is a coordinate along one dimension. In the context of search, vector embeddings are typically generated by a machine learning model to capture semantic meaning. Content with similar meaning is mapped to nearby points in this space, so proximity between vectors indicates similarity. For example, the phrases "budget hotels" and "affordable places to stay" would have embeddings near each other even though they share no words.

In Elasticsearch, embeddings are stored in dense_vector or sparse_vector fields. Example of a dense vector (8 dimensions):

[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]

Dimensions

The number of elements in a vector. Each dimension corresponds to one coordinate in the vector space. Dense embedding models typically produce vectors with hundreds or thousands of dimensions (for example, 384, 768, or 1536). Higher dimensions can capture more nuance but use more memory and compute. The dimension count is fixed by the model and must match between stored vectors and query vectors.

Dense vectors

Fixed-length arrays where every element has a value. They are produced by neural embedding models that learn to map content into a continuous space. Dense vectors capture overall semantic meaning and work well for natural language understanding, multilingual content, and rich semantic matching. They typically have hundreds or thousands of dimensions.

In Elasticsearch, dense vectors use the dense_vector field type and are queried with the knn query. You can deploy external or hosted embedding models, or bring your own pre-computed vectors.

Sparse vectors

Arrays where most elements are zero. Only a small number of dimensions carry meaningful values, each corresponding to a specific term or concept. Sparse vectors are often used for lexical-style matching with semantic expansion: content is expanded into weighted terms that capture related concepts. Results tend to be more explainable because you can see which terms contributed to the match.

In Elasticsearch, sparse vectors are generated by the ELSER (Elastic Learned Sparse Encoder) model and use the sparse_vector field type. ELSER is built-in and requires no external model deployment.

Semantic search and vector search

Elasticsearch uses vector search as the foundation for semantic search capabilities. You can work with this technology in two ways:

The Semantic search section provides managed workflows that use vector search under the hood:
- The semantic_text field type offers the simplest path with automatic embedding generation and model management
- Additional implementation options available for more complex needs
Vector search gives you direct access to the underlying technology:
- Manual configuration of dense or sparse vectors
- Flexibility to bring your own embeddings
- Direct implementation of vector similarity matching

Once you've implemented either approach, you can combine it with full-text search to create hybrid search solutions that leverage both meaning-based and keyword-based matching.

For a broader view of when each search approach fits your goals, refer to Search approaches.

Field types and queries

You can query vector fields using Query DSL or ES|QL syntax. The following table explains how to work with each field type in these languages.

Query DSL is the JSON request body for the _search API and related search endpoints. You describe match clauses, filters, scoring, and vector queries in nested objects.

ES|QL is a piped query language in Elasticsearch. You write a linear pipeline of commands and functions to read, filter, and score data. Learn how to use ES|QL for search.

Field type	Vector type	Query DSL query	ES\|QL (search-related functions)
`dense_vector`	Dense vectors	`knn`	- Find similar documents using `KNN` - Compare two vectors you already have in the query using `V_COSINE`, `V_DOT_PRODUCT`, `V_HAMMING`, `V_L1_NORM`, and `V_L2_NORM`
`sparse_vector`	Sparse vectors	`sparse_vector`	No ES\|QL search function targets the field.
`semantic_text`	Sparse or dense	`match`, `knn`, `sparse_vector`, `semantic`	- Find similar documents using `KNN`, `MATCH` and the match operator (:) - When both sides are `dense_vector` values in the query, compare them using `V_COSINE`, `V_DOT_PRODUCT`, `V_HAMMING`, `V_L1_NORM`, and `V_L2_NORM`

Not all ES|QL vector functions are available for every field type in older versions. Refer to ES|QL limitations to check support for your version.

Combining search strategies

You are not limited to a single retrieval style. Search applications can combine traditional keyword search, nearest neighbor vector search, sparse learned retrieval, and reranking within the same workflow.

You can implement these combinations in one of two ways:

Use retrievers to configure multi-stage retrieval pipelines within a single _search call.
Use an ES|QL query that leverages FORK to run retrieval branches in parallel and FUSE to combine the results using RRF or linear combination algorithms.

Using retrievers

Use retrievers to combine multiple retrieval strategies in a single _search request.

For example, you can:

Use retrievers to combine keyword search with knn or sparse_vector retrieval
Use the rrf retriever to merge rankings with reciprocal rank fusion
Use the linear retriever to combine scores with custom weights

For supported retriever types and request syntax, refer to the retrievers reference. Refer to Hybrid search for an end-to-end guide to combining lexical and semantic search.

Using ES|QL

Use ES|QL commands to combine multiple search strategies in a single query.

For example, you can:

Use FORK to run lexical and semantic searches in parallel
Use FUSE with fuse_method set to RRF to merge rankings with reciprocal rank fusion
Use FUSE with fuse_method set to LINEAR to combine scores with custom weights
Use RERANK to apply semantic reranking to the top search results after combining them

Refer to ES|QL for search for examples using FORK and FUSE. For FUSE parameters such as fuse_method, weights, and rank_constant, refer to the FUSE command reference.

For end-to-end hybrid search tutorials, refer to Hybrid search with semantic_text, and Search and filter with ES|QL.

Vector storage optimization

Embedding models usually output floating-point vectors (for example, 32 bits per dimension). At scale, these vectors consume substantial memory and can slow search. Quantization is a form of lossy compression that reduces the precision of vector values. It trades a small amount of accuracy for lower memory use and faster similarity computations. For production workloads with millions or billions of vectors, quantization is often essential to keep latency and cost manageable.

Elasticsearch offers several quantization options for dense_vector fields: BBQ (Better Binary Quantization), int8, and int4. For the full list, configuration details, and trade-offs, refer to Automatically quantize vectors for kNN search.

Vector embedding models

A machine learning model that converts your source data into vector embeddings. The model you choose determines the dimensionality and quality of the resulting vectors. It also constrains what types of content the system understands well. The vectors in your index and your query vectors must be generated by the same model for similarity comparisons to be meaningful.

Elasticsearch provides built-in embedding models and managed hosting:

ELSER (Elastic Learned Sparse Encoder): sparse vector model for explainable, term-based semantic search
E5: multilingual dense embedding model deployable in Elasticsearch
Jina models: dense embedding models (for example, jina-embeddings-v3, jina-embeddings-v5-text-small) available through Elastic Inference Service (EIS)
The inference API integrates with third-party embedding services. Examples include Cohere, OpenAI, Hugging Face, Amazon Bedrock, Azure OpenAI, and Google Vertex AI.

Chunking

Chunking splits large documents into smaller pieces before generating embeddings. This helps improve retrieval quality by matching queries to the most relevant parts of a document.

To learn how to configure chunking for the semantic_text field type, refer to the inference API chunking configuration. If you use your own embeddings, you are responsible for chunking your data before indexing. Refer to Bring your own dense vectors for guidance.

Implementation guides and tutorials

Elasticsearch provides multiple ways to implement vector and semantic search, depending on how much control you need over embedding generation and retrieval.

Semantic search (managed workflows)

The Semantic search section provides managed workflows that use vector search under the hood. These approaches handle embedding generation, chunking, and model management for you, making them the simplest way to get started.

Semantic search with semantic_text: Generate embeddings using the semantic_text field type with built-in defaults for chunking and model management.
Hybrid search with semantic_text: Combine semantic understanding with keyword search for better relevance in real applications.
Semantic search with the Inference API: Use custom or external embedding models and control how embeddings are generated.
Semantic search with ELSER: Use built-in semantic search with explainable results, without external models.
Using Cohere with Elasticsearch: Generate embeddings using Cohere models via the Inference API and combine vector, hybrid search, reranking, and RAG in a single workflow.

Advanced tutorials

These guides provide more direct or customizable approaches to working with vector search:

kNN search in Elasticsearch: Perform vector similarity search using the dense_vector field type and k-nearest neighbor queries.
Bring your own dense vectors: Use this if you already have embeddings and want to index and search them in Elasticsearch.
Sparse vector search in Elasticsearch: Perform semantic search using sparse vectors with the ELSER model and the sparse_vector field type.
Manual dense and sparse workflows: Generate embeddings at ingest time using pipelines and perform semantic or hybrid search with dense or sparse models.
OpenAI-compatible models: Connect external or local LLMs using the Inference API to generate responses or build RAG workflows.