Elastic Inference Service

Elastic Inference Service (EIS) enables you to leverage AI-powered search as a service without deploying a model in your environment. With EIS, you don't need to manage the infrastructure and resources required for machine learning inference by adding, configuring, and scaling machine learning nodes. Instead, you can use machine learning models for ingest, search, and chat independently of your Elasticsearch infrastructure.

You can use EIS with your self-managed cluster through Cloud Connect. For details, refer to EIS for self-managed clusters.

AI features powered by EIS

Your Elastic deployment or project comes with Elastic Managed LLMs by default. These can be used in Agent Builder, the AI Assistant, Attack Discovery, Automatic Import and Search Playground. For the list of available models, refer to the documentation.
You can use ELSER to perform semantic search as a service (ELSER on EIS).
You can use the jina-embeddings-v3 multilingual dense vector embedding model to perform semantic search through the Elastic Inference Service.

Supported models

The following tables list the models supported by Elastic Inference Service by model type.

The corresponding Kibana connectors and inference endpoints for these models are created automatically. To customize the configuration, you can create your own connectors or inference endpoints.

Note

The Inference Regions column shows the regions where inference requests are processed and where data is sent.

LLM chat models

Scroll horizontally to view more information.

Author	Name	ID	Model Card	Provider Terms	Input Modalities	Output Modalities	EOL Date	Data Used To Train Models?	Inference Regions	Release Status	Stack Version
Anthropic	Claude Opus 4.5	anthropic-claude-4.5-opus	Claude Opus 4.5 System Card	Google Terms AWS Terms	Text	Text	2026-11-24	No	US	Generally Available	9.3
Anthropic	Claude Opus 4.6	anthropic-claude-4.6-opus	Claude Opus 4.6 System Card	Google Terms AWS Terms	Text	Text	2027-02-05	No	US	Generally Available	9.3
Anthropic	Claude Sonnet 3.7	anthropic-claude-3.7-sonnet	Anthropic Claude 3.7 Sonnet	AWS terms	Text	Text	2026-04-28	No	US	Legacy (EOL Soon)	9.2
Anthropic	Claude Sonnet 4.5	anthropic-claude-4.5-sonnet	Anthropic Claude 4.5 Sonnet	AWS terms	Text	Text	2026-09-29	No	US	Generally Available	9.2
Google	Gemini 2.5 Flash	google-gemini-2.5-flash	Google Gemini 2.5 Flash	Google terms	Text	Text	2027-06-17	No	US	Generally Available	9.3
Google	Gemini 2.5 Pro	google-gemini-2.5-pro	Google Gemini 2.5 Pro	Google terms	Text	Text	2027-06-17	No	US	Generally Available	9.3
OpenAI	GPT-4.1	openai-gpt-4.1	OpenAI GPT 4.1	Microsoft Terms	Text	Text	2027-04-11	No	US	Generally Available	9.3
OpenAI	GPT-4.1 Mini	openai-gpt-4.1-mini	OpenAI GPT 4.1 Mini	Microsoft Terms	Text	Text	2027-04-11	No	US	Generally Available	9.3
OpenAI	GPT-5.2	openai-gpt-5.2	OpenAI GPT 5.2	Microsoft Terms	Text	Text	2027-05-12	No	US	Generally Available	9.3
OpenAI	GPT-OSS 120B	openai-gpt-oss-120b	OpenAI GPT-OSS-120B	Google Terms Together AI Terms DeepInfra Terms AWS Terms	Text	Text		No	US	Generally Available	9.3

Embedding models

Scroll horizontally to view more information.

Author	Name	ID	Model Card	Provider Terms	Input Modalities	Output Modalities	Data Retention Period (Days)	Data Used To Train Models?	Inference Regions	Release Status	Stack Version
Elastic	ELSER v2	elser_model_2	ELSER docs	Elastic Terms	Text	Embedding	0	No	US	Generally Available	9.1
Jina	Embeddings v3	jina-embeddings-v3	jina-embeddings-v3	Elastic Terms	Text	Embedding	0	No	US	Generally Available	9.3
Jina	Embeddings v5 Text Nano	jina-embeddings-v5-text-nano	jina-embeddings-v5-text-nano	Elastic Terms	Text	Embedding	0	No	US	Generally Available	9.3
Jina	Embeddings v5 Text Small	jina-embeddings-v5-text-small	jina-embeddings-v5-text-small	Elastic Terms	Text	Embedding	0	No	US	Generally Available	9.3
Google	Gemini Embedding 001	google-gemini-embedding-001	Gemini Embedding 001	Google terms	Text	Text	55 days	No	US	Generally Available	9.3
OpenAI	Text Embedding 003 Large	openai-text-embedding-3-large	Text Embedding 003 Large	OpenAI terms	Text	Text	Unknown	No	US	Generally Available	9.3
OpenAI	Text Embedding 003 Small	openai-text-embedding-3-small	Text Embedding 003 Small	OpenAI terms	Text	Text	Unknown	No	US	Generally Available	9.3

Rerankers

Scroll horizontally to view more information.

Author	Name	ID	Model Card	Provider Terms	Input Modalities	Output Modalities	EOL Date	Data Retention Period (Days)	Data Used To Train Models?	Inference Regions	Release Status	Stack Version
Jina	Reranker v2	jina-reranker-v2-base-multilingual	jina-reranker-v2-base-multilingual	Elastic Terms	Text	Text		0	No	US	Generally Available	9.3
Jina	Reranker v3	jina-reranker-v3	jina-reranker-v3	Elastic Terms	Text	Text		0	No	US	Generally Available	9.3

Important

The applicable terms of use, uptime, and performance for each of the AI models available with EIS are each described in the applicable AI model's Provider Terms and Model Card.
Prior to using the AI model with EIS, Customers are responsible for reviewing and agreeing to the chosen AI model's Provider Terms to understand the availability and data practices of the AI model's provider.
After the listed end-of-life (EOL) date, the model is no longer available for inference use and requests will fail. You need to actively transition to another model before the EOL date, there is no automated migration.

Region and hosting

Elastic Inference Service is currently available in a single region: AWS us-east-1. All inference requests sent through EIS are routed to this region, regardless of where your Elasticsearch deployment or Serverless project is hosted.

Depending on the model being used, request processing may involve Elastic inference infrastructure and, in some cases, trusted third-party model providers. For example, ELSER requests are processed entirely within Elastic inference infrastructure in AWS us-east-1. Other models, such as large language models or third-party embedding models, may involve additional processing by their respective model providers, which can operate in different cloud platforms or regions.

Rate limits

The service enforces rate limits on an ongoing basis. Exceeding a limit results in HTTP 429 responses from the server until the sliding window moves on further and parts of the limit resets.

Model	Request/minute	Tokens/minute (ingest)	Tokens/minute (search)	Notes
Elastic Managed LLMs	2000	-	-	No rate limit on tokens
ELSER	6,000	6,000,000	600,000	Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.
Jina Embeddings v3	6,000	6,000,000	600,000	Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.
Jina Reranker v2	50	-	500,000	Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.
Jina Reranker v3	50	-	500,000	Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.

Pricing

All models on EIS incur a charge per million tokens. The pricing details are available on our Pricing page.

This pricing model differs from the existing Machine Learning Nodes, which is billed through VCUs consumed.

Token-based billing

EIS is billed per million tokens used:

For chat models, input and output tokens are billed. Longer conversations with extensive context or detailed responses will consume more tokens.
For embeddings models, only input tokens are billed.

Tokens are the fundamental units that language models process for both input and output. Tokenizers convert text into numerical data by segmenting it into subword units. A token can be a complete word, part of a word, or a punctuation mark, depending on the model's trained tokenizer and the frequency patterns in its training data.

For example, the sentence It was the best of times, it was the worst of times. contains 52 characters but would tokenize into approximately 14 tokens with a typical word-based approach, though the exact count varies by tokenizer.

Monitor your token usage

To track your token consumption:

Navigate to Billing and subscriptions > Usage in the Elastic Cloud Console.
Look for line items where the Billing dimension is set to "Inference".

Use cases

The following sections describe how to get started with specific models available through Elastic Inference Service, including creating inference endpoints and using them for search and ingest.

`jina-embeddings-v5-text-small` on EIS

You can use the jina-embeddings-v5-text-small model through Elastic Inference Service. Running the model on EIS means that you use the model on GPUs, without the need of managing infrastructure and model resources.

Get started with `jina-embeddings-v5-text-small` on EIS

Create an inference endpoint that references the jina-embeddings-v5-text-small model in the model_id field.

						PUT _inference/text_embedding/eis-jina-embeddings-v5-text-small
					{
  "service": "elastic",
  "service_settings": {
    "model_id": "jina-embeddings-v5-text-small"
  }
}
		
	

The created inference endpoint uses the model for inference operations on the Elastic Inference Service. You can reference the inference_id of the endpoint in index mappings for the semantic_text field type, text_embedding inference tasks, or search queries.

`jina-embeddings-v3` on EIS

You can use the jina-embeddings-v3 model through Elastic Inference Service. Running the model on EIS means that you use the model on GPUs, without the need of managing infrastructure and model resources.

Get started with `jina-embeddings-v3` on EIS

Create an inference endpoint that references the jina-embeddings-v3 model in the model_id field.

						PUT _inference/text_embedding/eis-jina-embeddings-v3
					{
  "service": "elastic",
  "service_settings": {
    "model_id": "jina-embeddings-v3"
  }
}
		
	

ELSER through Elastic Inference Service (ELSER on EIS)

ELSER on EIS enables you to use the ELSER model on GPUs, without having to manage your own ML nodes. We expect better performance for ingest throughput than ML nodes and equivalent performance for search latency. We will continue to benchmark, remove limitations and address concerns.

Using the ELSER on EIS endpoint

You can now use semantic_text with the new ELSER endpoint on EIS. To learn how to use the .elser-2-elastic inference endpoint, refer to Using ELSER on EIS.

Get started with semantic search with ELSER on EIS

Semantic Search with semantic_text has a detailed tutorial on using the semantic_text field and using the ELSER endpoint on EIS instead of the default endpoint. This is a great way to get started and try the new endpoint.