Loading

Vector search in Elasticsearch

Tip

New to semantic search? Start with the semantic search quickstart, which uses the managed semantic_text workflow.

For common vector search use cases and how to apply them, refer to Vector search use cases.

Vector search stores embeddings and retrieves the most similar vectors to a query. Elasticsearch functions as a vector database: it scales embedding storage and similarity search while combining that with full-text search, filters, and aggregations in one engine. This page explains the core concepts and terminology you need before working with vector search in Elasticsearch.

Vector database
A system designed to store vector embeddings at scale and retrieve the most similar vectors to a query embedding, typically using approximate or exact k-nearest neighbor (kNN) search. Elasticsearch functions as a vector database when you store embeddings in dense_vector or sparse_vector fields and query them for similarity. Elasticsearch can combine vector search with full-text search, structured filters, aggregations, and hybrid retrieval in one engine, so you can keep lexical, semantic, and operational queries on the same data and infrastructure.
Vector embedding

An ordered list of numbers that represents data in a multi-dimensional space. Each number is a coordinate along one dimension. In the context of search, vector embeddings are typically generated by a machine learning model to capture semantic meaning. Content with similar meaning is mapped to nearby points in this space, so proximity between vectors indicates similarity. For example, the phrases "budget hotels" and "affordable places to stay" would have embeddings near each other even though they share no words.

In Elasticsearch, embeddings are stored in dense_vector or sparse_vector fields. Example of a dense vector (8 dimensions):

[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
		
Dimensions
The number of elements in a vector. Each dimension corresponds to one coordinate in the vector space. Dense embedding models typically produce vectors with hundreds or thousands of dimensions (for example, 384, 768, or 1536). Higher dimensions can capture more nuance but use more memory and compute. The dimension count is fixed by the model and must match between stored vectors and query vectors.
Dense vectors

Fixed-length arrays where every element has a value. They are produced by neural embedding models that learn to map content into a continuous space. Dense vectors capture overall semantic meaning and work well for natural language understanding, multilingual content, and rich semantic matching. They typically have hundreds or thousands of dimensions.

In Elasticsearch, dense vectors use the dense_vector field type and are queried with the knn query. You can deploy external or hosted embedding models, or bring your own pre-computed vectors.

Sparse vectors

Arrays where most elements are zero. Only a small number of dimensions carry meaningful values, each corresponding to a specific term or concept. Sparse vectors are often used for lexical-style matching with semantic expansion: content is expanded into weighted terms that capture related concepts. Results tend to be more explainable because you can see which terms contributed to the match.

In Elasticsearch, sparse vectors are generated by the ELSER (Elastic Learned Sparse Encoder) model and use the sparse_vector field type. ELSER is built-in and requires no external model deployment.

Elasticsearch uses vector search as the foundation for semantic search capabilities. You can work with this technology in two ways:

  1. The Semantic search section provides managed workflows that use vector search under the hood:
    • The semantic_text field type offers the simplest path with automatic embedding generation and model management
    • Additional implementation options available for more complex needs
  2. Vector search gives you direct access to the underlying technology:
    • Manual configuration of dense or sparse vectors
    • Flexibility to bring your own embeddings
    • Direct implementation of vector similarity matching

Once you've implemented either approach, you can combine it with full-text search to create hybrid search solutions that leverage both meaning-based and keyword-based matching.

For a broader view of when each search approach fits your goals, refer to Search approaches.

You can query vector fields using Query DSL or ES|QL syntax. The following table explains how to work with each field type in these languages.

Query DSL is the JSON request body for the _search API and related search endpoints. You describe match clauses, filters, scoring, and vector queries in nested objects.

ES|QL is a piped query language in Elasticsearch. You write a linear pipeline of commands and functions to read, filter, and score data. Learn how to use ES|QL for search.

Field type Vector type Query DSL query ES|QL (search-related functions)
dense_vector Dense vectors knn - Find similar documents using KNN
- Compare two vectors you already have in the query using V_COSINE, V_DOT_PRODUCT, V_HAMMING, V_L1_NORM, and V_L2_NORM
sparse_vector Sparse vectors sparse_vector No ES|QL search function targets the field.
semantic_text Sparse or dense match, knn, sparse_vector, semantic - Find similar documents using KNN, MATCH and the match operator (:)
- When both sides are dense_vector values in the query, compare them using V_COSINE, V_DOT_PRODUCT, V_HAMMING, V_L1_NORM, and V_L2_NORM

Not all ES|QL vector functions are available for every field type in older versions. Refer to ES|QL limitations to check support for your version.

You are not limited to a single retrieval style. Search applications can combine traditional keyword search, nearest neighbor vector search, sparse learned retrieval, and reranking within the same workflow.

You can implement these combinations in one of two ways:

  • Use retrievers to configure multi-stage retrieval pipelines within a single _search call.
  • Use an ES|QL query that leverages FORK to run retrieval branches in parallel and FUSE to combine the results using RRF or linear combination algorithms.

Use retrievers to combine multiple retrieval strategies in a single _search request.

For example, you can:

For supported retriever types and request syntax, refer to the retrievers reference. Refer to Hybrid search for an end-to-end guide to combining lexical and semantic search.

Use ES|QL commands to combine multiple search strategies in a single query.

For example, you can:

  • Use FORK to run lexical and semantic searches in parallel
  • Use FUSE with fuse_method set to RRF to merge rankings with reciprocal rank fusion
  • Use FUSE with fuse_method set to LINEAR to combine scores with custom weights
  • Use RERANK to apply semantic reranking to the top search results after combining them

Refer to ES|QL for search for examples using FORK and FUSE. For FUSE parameters such as fuse_method, weights, and rank_constant, refer to the FUSE command reference.

For end-to-end hybrid search tutorials, refer to Hybrid search with semantic_text, and Search and filter with ES|QL.

Embedding models usually output floating-point vectors (for example, 32 bits per dimension). At scale, these vectors consume substantial memory and can slow search. Quantization is a form of lossy compression that reduces the precision of vector values. It trades a small amount of accuracy for lower memory use and faster similarity computations. For production workloads with millions or billions of vectors, quantization is often essential to keep latency and cost manageable.

Elasticsearch offers several quantization options for dense_vector fields: BBQ (Better Binary Quantization), int8, and int4. For the full list, configuration details, and trade-offs, refer to Automatically quantize vectors for kNN search.

A machine learning model that converts your source data into vector embeddings. The model you choose determines the dimensionality and quality of the resulting vectors. It also constrains what types of content the system understands well. The vectors in your index and your query vectors must be generated by the same model for similarity comparisons to be meaningful.

Elasticsearch provides built-in embedding models and managed hosting:

Chunking splits large documents into smaller pieces before generating embeddings. This helps improve retrieval quality by matching queries to the most relevant parts of a document.

To learn how to configure chunking for the semantic_text field type, refer to the inference API chunking configuration. If you use your own embeddings, you are responsible for chunking your data before indexing. Refer to Bring your own dense vectors for guidance.

Elasticsearch provides multiple ways to implement vector and semantic search, depending on how much control you need over embedding generation and retrieval.

The Semantic search section provides managed workflows that use vector search under the hood. These approaches handle embedding generation, chunking, and model management for you, making them the simplest way to get started.

These guides provide more direct or customizable approaches to working with vector search: