How vector search works
Vector search finds results based on meaning rather than exact keyword matches. This page explains the core concepts and terminology you need before working with vector search in Elasticsearch.
- Vector embedding
-
An ordered list of numbers that represents data in a multi-dimensional space. Each number is a coordinate along one dimension. In the context of search, vector embeddings are typically generated by a machine learning model to capture semantic meaning. Content with similar meaning is mapped to nearby points in this space, so proximity between vectors indicates similarity. For example, the phrases "budget hotels" and "affordable places to stay" would have embeddings near each other even though they share no words.
In Elasticsearch, embeddings are stored in
dense_vectororsparse_vectorfields. Example of a dense vector (8 dimensions):[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8] - Dimensions
- The number of elements in a vector. Each dimension corresponds to one coordinate in the vector space. Dense embedding models typically produce vectors with hundreds or thousands of dimensions (for example, 384, 768, or 1536). Higher dimensions can capture more nuance but use more memory and compute. The dimension count is fixed by the model and must match between stored vectors and query vectors.
- Embedding model
-
A machine learning model that converts your source data into vector embeddings. The model you choose determines the dimensionality and quality of the resulting vectors. It also constrains what types of content the system understands well. The vectors in your index and your query vectors must be generated by the same model for similarity comparisons to be meaningful.
Elasticsearch provides built-in embedding models and managed hosting:
- ELSER (Elastic Learned Sparse Encoder): sparse vector model for explainable, term-based semantic search
- E5: multilingual dense embedding model deployable in Elasticsearch
- Jina models: dense embedding models (for example,
jina-embeddings-v3,jina-embeddings-v5-text-small) available through Elastic Inference Service (EIS)
The inference API integrates with third-party embedding services. Examples include Cohere, OpenAI, Hugging Face, Amazon Bedrock, Azure OpenAI, and Google Vertex AI.
Elasticsearch supports two types of vector representations, each suited to different use cases and implementation patterns.
- Dense vectors
-
Fixed-length arrays where every element has a value. They are produced by neural embedding models that learn to map content into a continuous space. Dense vectors capture overall semantic meaning and work well for natural language understanding, multilingual content, and rich semantic matching. They typically have hundreds or thousands of dimensions.
In Elasticsearch, dense vectors use the
dense_vectorfield type and are queried with theknnquery. You can deploy external or hosted embedding models, or bring your own pre-computed vectors. - Sparse vectors
-
Arrays where most elements are zero. Only a small number of dimensions carry meaningful values, each corresponding to a specific term or concept. Sparse vectors are often used for lexical-style matching with semantic expansion: content is expanded into weighted terms that capture related concepts. Results tend to be more explainable because you can see which terms contributed to the match.
In Elasticsearch, sparse vectors are generated by the ELSER (Elastic Learned Sparse Encoder) model and use the
sparse_vectorfield type. ELSER is built-in and requires no external model deployment.
Embedding models usually output floating-point vectors (for example, 32 bits per dimension). At scale, these vectors consume substantial memory and can slow search. Quantization is a form of lossy compression that reduces the precision of vector values. It trades a small amount of accuracy for lower memory use and faster similarity computations. For production workloads with millions or billions of vectors, quantization is often essential to keep latency and cost manageable.
Elasticsearch offers several quantization options for dense_vector fields: BBQ (Better Binary Quantization), int8, and int4. For the full list, configuration details, and trade-offs, refer to Automatically quantize vectors for kNN search.
- Semantic search: managed workflows using
semantic_textand the Inference API - Dense vector search: manual dense vector implementation
- Sparse vector search: ELSER-based semantic search
- kNN search: approximate and exact k-nearest neighbor search