Vector search in Elasticsearch

Sometimes full-text search alone isn't enough. Machine learning techniques help you find data based on intent and contextual meaning, not just keywords. Vector search is the foundation for these capabilities in Elasticsearch.

Vector search uses machine learning models to convert content into numerical representations called vector embeddings. These embeddings capture meaning and relationships, enabling Elasticsearch to retrieve results based on similarity rather than exact term matches.

Tip

New to vector search? Start with the semantic_text workflow, which provides an easy-to-use abstraction over vector search with sensible defaults and automatic model management. Learn more in this hands-on tutorial.

How it works

To understand the core concepts behind vector search, including vectors, embeddings, similarity, and the difference between dense and sparse approaches, refer to How vector search works.

Use cases

Vector search enables a wide range of applications:

Natural language search: Let users search in everyday language and get results based on meaning, not just keywords.
Retrieval Augmented Generation (RAG): Retrieve relevant documents from Elasticsearch and feed them into a large language model (LLM) to generate grounded, context-aware answers.
Question answering: Match natural language questions to the most relevant answers in your data.
Content recommendations: Suggest related articles, products, or media based on vector similarity.
Large-scale information retrieval: Search across millions or billions of documents efficiently.
Product discovery: Help users find products that match their intent, even when they don't use exact product terms.
Workplace document search: Search internal knowledge bases, wikis, and documents by meaning rather than exact keywords.
Image and multimedia similarity: Find visually or semantically similar images, audio, or video by comparing their vector representations.

Tip

You can combine vector search with full-text search for hybrid search that leverages both meaning-based and keyword-based matching.

Which workflow should I use?

Elasticsearch offers several ways to implement vector search. Your choice depends on how much control you need and what type of content you are searching.

Semantic search workflows

Semantic search workflows are managed and require minimal configuration. They handle embedding generation and model management for you. Choose semantic search when:

You want to get started quickly with natural language search
You prefer Elastic to manage models and indexing defaults
Your use case is text-based and fits common patterns (document search, RAG, question answering)

Direct vector search

Direct vector search uses the dense_vector and sparse_vector field types. Choose this when:

You already have pre-computed embeddings or generate them outside Elasticsearch
You need to search non-text content (images, audio) with embeddings from external models
You require fine-grained control over indexing, quantization, or query parameters

Resources

Resources are grouped by implementation path. Try out our tutorials in Start here for a quick win, or jump to the workflow that matches how much control you need.

Start here

Get started with semantic search: Set up hybrid search using semantic_text with dense vector embeddings. The recommended starting point.
How vector search works: Core concepts: vectors, embeddings, dimensions, similarity, dense vs. sparse vectors, and quantization.

Managed workflows

Use semantic_text, the Inference APIs, or ELSER for semantic search with managed embedding generation and model deployment.

Semantic search with semantic_text: Implement semantic search with automatic embedding generation and model management.
Hybrid search with semantic_text: Combine vector search with full-text search using reciprocal rank fusion.
Semantic search with the inference API: Configure inference endpoints for more control over embedding generation.
Semantic search with ELSER: Deploy the ELSER sparse vector model and build a semantic search pipeline.

Manual implementation

Work directly with dense_vector and sparse_vector field types when you need more control over indexing, quantization, and query parameters.

Bring your own dense vectors to Elasticsearch: Store and search pre-computed dense vectors using the dense_vector field type.
Dense vector search in Elasticsearch: How dense vectors capture semantic meaning using neural embeddings, and how to use them in Elasticsearch.
Sparse vector search in Elasticsearch: How ELSER generates sparse vectors for explainable, term-based semantic matching.
Tutorial: Dense and sparse workflows using ingest pipelines: A side-by-side walkthrough of dense and sparse vector ingest pipelines.

Querying and pipelines

Build multi-stage retrieval and improve result ranking.

kNN search in Elasticsearch: Run approximate and exact k-nearest neighbor searches, with filtering, multi-kNN, and nested vector support.
Retrievers: Compose multi-stage retrieval pipelines that combine different search strategies in a single request.
Semantic reranking: Rerank search results using a cross-encoder model to improve relevance after initial retrieval.

Models and infrastructure

Learn about the models and services that power vector search in Elasticsearch.

ELSER: Elastic's built-in sparse vector model for semantic search with explainable, term-based matching.
E5: A multilingual dense embedding model that can be deployed directly in Elasticsearch.
Elastic Inference Service: A managed service for running machine learning models for embedding generation and other NLP tasks.
Search and compare text: Use deployed NLP models to search and compare text at query time.
Text embedding and semantic search: Deploy a text embedding model and use it for vector search, from model setup to query.
Using Cohere with Elasticsearch: Generate embeddings and perform semantic search using Cohere's models.

Optimization

Tune vector search for production performance.

Tune approximate kNN search: Optimize vector search performance by tuning quantization, HNSW parameters, memory, and recall tradeoffs.