How vector search works

Vector search finds results based on meaning rather than exact keyword matches. This page explains the core concepts and terminology you need before working with vector search in Elasticsearch.

Vectors and embeddings

Vector embedding

An ordered list of numbers that represents data in a multi-dimensional space. Each number is a coordinate along one dimension. In the context of search, vector embeddings are typically generated by a machine learning model to capture semantic meaning. Content with similar meaning is mapped to nearby points in this space, so proximity between vectors indicates similarity. For example, the phrases "budget hotels" and "affordable places to stay" would have embeddings near each other even though they share no words.

In Elasticsearch, embeddings are stored in dense_vector or sparse_vector fields. Example of a dense vector (8 dimensions):

[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]

Dimensions

The number of elements in a vector. Each dimension corresponds to one coordinate in the vector space. Dense embedding models typically produce vectors with hundreds or thousands of dimensions (for example, 384, 768, or 1536). Higher dimensions can capture more nuance but use more memory and compute. The dimension count is fixed by the model and must match between stored vectors and query vectors.

Embedding model

A machine learning model that converts your source data into vector embeddings. The model you choose determines the dimensionality and quality of the resulting vectors. It also constrains what types of content the system understands well. The vectors in your index and your query vectors must be generated by the same model for similarity comparisons to be meaningful.

Elasticsearch provides built-in embedding models and managed hosting:

ELSER (Elastic Learned Sparse Encoder): sparse vector model for explainable, term-based semantic search
E5: multilingual dense embedding model deployable in Elasticsearch
Jina models: dense embedding models (for example, jina-embeddings-v3, jina-embeddings-v5-text-small) available through Elastic Inference Service (EIS)

The inference API integrates with third-party embedding services. Examples include Cohere, OpenAI, Hugging Face, Amazon Bedrock, Azure OpenAI, and Google Vertex AI.

Vector search workflows

Dense vectors vs. sparse vectors

Elasticsearch supports two types of vector representations, each suited to different use cases and implementation patterns.

Dense vectors

Fixed-length arrays where every element has a value. They are produced by neural embedding models that learn to map content into a continuous space. Dense vectors capture overall semantic meaning and work well for natural language understanding, multilingual content, and rich semantic matching. They typically have hundreds or thousands of dimensions.

In Elasticsearch, dense vectors use the dense_vector field type and are queried with the knn query. You can deploy external or hosted embedding models, or bring your own pre-computed vectors.

Sparse vectors

Arrays where most elements are zero. Only a small number of dimensions carry meaningful values, each corresponding to a specific term or concept. Sparse vectors are often used for lexical-style matching with semantic expansion: content is expanded into weighted terms that capture related concepts. Results tend to be more explainable because you can see which terms contributed to the match.

In Elasticsearch, sparse vectors are generated by the ELSER (Elastic Learned Sparse Encoder) model and use the sparse_vector field type. ELSER is built-in and requires no external model deployment.

Quantization and compression

Embedding models usually output floating-point vectors (for example, 32 bits per dimension). At scale, these vectors consume substantial memory and can slow search. Quantization is a form of lossy compression that reduces the precision of vector values. It trades a small amount of accuracy for lower memory use and faster similarity computations. For production workloads with millions or billions of vectors, quantization is often essential to keep latency and cost manageable.

Elasticsearch offers several quantization options for dense_vector fields: BBQ (Better Binary Quantization), int8, and int4. For the full list, configuration details, and trade-offs, refer to Automatically quantize vectors for kNN search.

Querying

Next steps

Semantic search: managed workflows using semantic_text and the Inference API
Dense vector search: manual dense vector implementation
Sparse vector search: ELSER-based semantic search
kNN search: approximate and exact k-nearest neighbor search