Vector search use cases
Sometimes full-text search alone is not enough. Machine learning helps you find data by meaning, not only by matching keywords. Vector search is how Elasticsearch supports these workloads.
This page describes common vector search use cases and how to implement them.
New to vector search? You might want to start with the managed semantic_text workflow.
Choose a search strategy based on the following:
- Embeddings: For meaning-based text search, start with managed workflows such as
semantic_text. When you need full control over models, embeddings, or non-text vectors, configure vector search directly. To understand the differences and choose the right approach, refer to Semantic search and vector search. - Query interface: Send requests with the Search API and Query DSL or ES|QL for search. Use the same approach at index time and at search time.
- Combine strategies: To rank keyword and vector results together, use Hybrid search or retrievers in a single Search API request.
Use Elasticsearch to find relevant passages in your documents, wikis, tickets, or knowledge bases, then pass those passages to a language model. The model can answer using your data instead of only its training data. This fits internal assistants, support bots, and tools that must cite sources.
-
Learn how RAG works in Elasticsearch
Read how retrieval, chunking, and orchestration fit together.
-
Set up search for your documents
Split long documents into smaller chunks so each search hit is a useful passage. Refer to How to implement retrieval to choose your embedding approach, query interface, and search strategy.
-
Generate answers with an LLM
Send the top search hits and their text fields to your model or orchestration layer.
Find related products, articles, videos, or other items when keywords alone do not match well. Examples include "similar products," "you may also like," and matching users or players in an app.
-
Store embeddings for each item
Each item needs a vector from the same model so similarity scores are comparable. Refer to How to implement retrieval for your embedding approach.
-
Search for similar items
Use the vector of the current item (or a user profile vector) as the search input. Run a k-nearest neighbor (kNN) query to get the closest matches. On large catalogs, adjust
kandnum_candidatesto balance speed and quality. Refer to How to implement retrieval for your query interface and search strategy. -
Limit results with filters
A filter is a rule on structured fields in your index, such as "in stock," "region = EU," or "category = shoes." It narrows which documents kNN considers. Without filters, similarity search might return items the user cannot buy or see.
Add a
filterclause to your kNN request so only matching documents are returned. This is important for catalogs where most items are out of scope for a given user. -
Improve the result order
The closest vectors are not always the best final ranking. You can boost by popularity or recency, or rescore the top results. Refer to How to implement retrieval for how to combine search strategies.
Search images, audio, video, or text when your content uses more than one type. For example, search with text to find images, or search with an image to find similar images.
The steps below use the Inference API to embed multimodal content. Refer to How to implement retrieval for other embedding approaches.
-
Create an inference endpoint
Create an endpoint with a model that supports your media types (text, images, audio, or video). Use the
embeddingtask type for multimodal models. Use the same endpoint ID when you ingest documents and when you run a search. -
Add an index mapping and ingest pipeline
Define a
dense_vectorfield for embeddings and any other fields you need for filters (category, license, date). In the same tutorial, add an ingest pipeline with an inference processor that calls your endpoint, then load your documents so each one is embedded at index time. -
Run kNN search
Use kNN with a
query_vector_builderso Elasticsearch embeds the user's query with the same model, then returns the closest vectors. Add a filter on structured fields when you need to limit results by category or other rules. Refer to How to implement retrieval for your query interface and search strategy.
Compare documents, accounts, or events to find near-duplicates, suspicious matches, or unusual patterns that exact matching would miss. Examples include duplicate articles, fraudulent transactions, and operational outliers.
-
Clean records before embedding
Use an ingest pipeline to normalize the fields you embed before they are indexed. The goal is to avoid separate vectors for content that only differs in formatting.
-
Store one vector per record you compare
Index one embedding per document, account snapshot, or time window you want to compare. Refer to How to implement retrieval for your embedding approach.
-
Find neighbors and apply thresholds
Run kNN for each new or suspect record. In your application, use the similarity score to decide what to do: mark pairs above a threshold as duplicates, block submissions close to a known fraud example, or raise an alert when neighbors look unusual. Refer to How to implement retrieval for your query interface and search strategy.
-
Act on matches in your pipeline
Run scheduled checks on new data, write matches to a review index, or combine vector results with aggregations (for example, count duplicates per URL). For time-series patterns that are not vector-based, you can also use anomaly detection in Elasticsearch.
Store facts, chat turns, or summaries so an assistant can load relevant past context without sending the full chat history every time.
-
Design the memory index
Store a user or session ID, a timestamp, and optional expiry fields. Decide whether each stored item is a full message, a short fact, or a summary so search returns the right level of detail.
-
Index new memories with embeddings
Use the same embedding setup at index time and at search time. Refer to How to implement retrieval for your embedding approach.
-
Retrieve memories for each new message
Restrict search to the current user or session, then run semantic or kNN search on the new message. Pass the top hits to your application with the user's latest input. Refer to How to implement retrieval for your query interface and search strategy.
-
Remove or merge old memories
Delete or roll up outdated entries in your app, or use index lifecycle management so the index stays accurate and does not grow without limit.