﻿---
title: Rate limits
description: Learn about rate limits for Elastic Inference Service (EIS) models.
url: https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6768/explore-analyze/elastic-inference/eis-rate-limits
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# Rate limits
This page lists the rate limits that apply to Elastic Inference Service (EIS) models.
Exceeding a limit results in HTTP 429 responses from the server until the sliding window moves on further and parts of the limit resets.

| Model                                                                                            | Request/minute | Tokens/minute (ingest) | Tokens/minute (search) | Notes                                                                                                   |
|--------------------------------------------------------------------------------------------------|----------------|------------------------|------------------------|---------------------------------------------------------------------------------------------------------|
| Elastic Managed LLMs <applies-to>Elastic Stack: Generally available since 9.3</applies-to>       | 2000           | -                      | -                      | No rate limit on tokens                                                                                 |
| ELSER <applies-to>Elastic Stack: Generally available since 9.0</applies-to>                      | 6,000          | 6,000,000              | 600,000                | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Embeddings v5 Nano <applies-to>Elastic Stack: Generally available since 9.3</applies-to>    | 6,000          | 6,000,000              | 600,000                | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Embeddings v5 Small <applies-to>Elastic Stack: Generally available since 9.3</applies-to>   | 6,000          | 6,000,000              | 600,000                | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Embeddings v3 <applies-to>Elastic Stack: Generally available since 9.3</applies-to>         | 6,000          | 6,000,000              | 600,000                | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Embeddings v5 (Small) <applies-to>Elastic Stack: Generally available since 9.3</applies-to> | 6,000          | 6,000,000              | 600,000                | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Embeddings v5 (Nano) <applies-to>Elastic Stack: Generally available since 9.3</applies-to>  | 6,000          | 6,000,000              | 600,000                | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Reranker v2 <applies-to>Elastic Stack: Generally available since 9.3</applies-to>           | 600            | -                      | 6,000,000              | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Reranker v3 <applies-to>Elastic Stack: Generally available since 9.3</applies-to>           | 600            | -                      | 6,000,000              | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |