Loading

Vector search use cases

Sometimes full-text search alone is not enough. Machine learning helps you find data by meaning, not only by matching keywords. Vector search is how Elasticsearch supports these workloads.

This page describes common vector search use cases and how to implement them.

Tip

New to vector search? You might want to start with the managed semantic_text workflow.

Choose a search strategy based on the following:

Use Elasticsearch to find relevant passages in your documents, wikis, tickets, or knowledge bases, then pass those passages to a language model. The model can answer using your data instead of only its training data. This fits internal assistants, support bots, and tools that must cite sources.

  1. Learn how RAG works in Elasticsearch

    Read how retrieval, chunking, and orchestration fit together.

  2. Set up search for your documents

    Split long documents into smaller chunks so each search hit is a useful passage. Refer to How to implement retrieval to choose your embedding approach, query interface, and search strategy.

  3. Generate answers with an LLM

    Send the top search hits and their text fields to your model or orchestration layer.

Find related products, articles, videos, or other items when keywords alone do not match well. Examples include "similar products," "you may also like," and matching users or players in an app.

  1. Store embeddings for each item

    Each item needs a vector from the same model so similarity scores are comparable. Refer to How to implement retrieval for your embedding approach.

  2. Search for similar items

    Use the vector of the current item (or a user profile vector) as the search input. Run a k-nearest neighbor (kNN) query to get the closest matches. On large catalogs, adjust k and num_candidates to balance speed and quality. Refer to How to implement retrieval for your query interface and search strategy.

  3. Limit results with filters

    A filter is a rule on structured fields in your index, such as "in stock," "region = EU," or "category = shoes." It narrows which documents kNN considers. Without filters, similarity search might return items the user cannot buy or see.

    Add a filter clause to your kNN request so only matching documents are returned. This is important for catalogs where most items are out of scope for a given user.

  4. Improve the result order

    The closest vectors are not always the best final ranking. You can boost by popularity or recency, or rescore the top results. Refer to How to implement retrieval for how to combine search strategies.

Search images, audio, video, or text when your content uses more than one type. For example, search with text to find images, or search with an image to find similar images.

The steps below use the Inference API to embed multimodal content. Refer to How to implement retrieval for other embedding approaches.

  1. Create an inference endpoint

    Create an endpoint with a model that supports your media types (text, images, audio, or video). Use the embedding task type for multimodal models. Use the same endpoint ID when you ingest documents and when you run a search.

  2. Add an index mapping and ingest pipeline

    Define a dense_vector field for embeddings and any other fields you need for filters (category, license, date). In the same tutorial, add an ingest pipeline with an inference processor that calls your endpoint, then load your documents so each one is embedded at index time.

  3. Use kNN with a query_vector_builder so Elasticsearch embeds the user's query with the same model, then returns the closest vectors. Add a filter on structured fields when you need to limit results by category or other rules. Refer to How to implement retrieval for your query interface and search strategy.

Compare documents, accounts, or events to find near-duplicates, suspicious matches, or unusual patterns that exact matching would miss. Examples include duplicate articles, fraudulent transactions, and operational outliers.

  1. Clean records before embedding

    Use an ingest pipeline to normalize the fields you embed before they are indexed. The goal is to avoid separate vectors for content that only differs in formatting.

  2. Store one vector per record you compare

    Index one embedding per document, account snapshot, or time window you want to compare. Refer to How to implement retrieval for your embedding approach.

  3. Find neighbors and apply thresholds

    Run kNN for each new or suspect record. In your application, use the similarity score to decide what to do: mark pairs above a threshold as duplicates, block submissions close to a known fraud example, or raise an alert when neighbors look unusual. Refer to How to implement retrieval for your query interface and search strategy.

  4. Act on matches in your pipeline

    Run scheduled checks on new data, write matches to a review index, or combine vector results with aggregations (for example, count duplicates per URL). For time-series patterns that are not vector-based, you can also use anomaly detection in Elasticsearch.

Store facts, chat turns, or summaries so an assistant can load relevant past context without sending the full chat history every time.

  1. Design the memory index

    Store a user or session ID, a timestamp, and optional expiry fields. Decide whether each stored item is a full message, a short fact, or a summary so search returns the right level of detail.

  2. Index new memories with embeddings

    Use the same embedding setup at index time and at search time. Refer to How to implement retrieval for your embedding approach.

  3. Retrieve memories for each new message

    Restrict search to the current user or session, then run semantic or kNN search on the new message. Pass the top hits to your application with the user's latest input. Refer to How to implement retrieval for your query interface and search strategy.

  4. Remove or merge old memories

    Delete or roll up outdated entries in your app, or use index lifecycle management so the index stays accurate and does not grow without limit.