Rate limits
This page lists the rate limits that apply to Elastic Inference Service (EIS) models.
Exceeding a limit results in HTTP 429 responses from the server until the sliding window moves on further and parts of the limit resets.
| Model | Request/minute | Tokens/minute (ingest) | Tokens/minute (search) | Notes |
|---|---|---|---|---|
| Elastic Managed LLMs
|
2000 | - | - | No rate limit on tokens |
| ELSER
|
6,000 | 6,000,000 | 600,000 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Embeddings v5 Nano
|
6,000 | 6,000,000 | 600,000 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Embeddings v5 Small
|
6,000 | 6,000,000 | 600,000 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Embeddings v3
|
6,000 | 6,000,000 | 600,000 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Embeddings v5 (Small)
|
6,000 | 6,000,000 | 600,000 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Embeddings v5 (Nano)
|
6,000 | 6,000,000 | 600,000 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Reranker v2
|
600 | - | 6,000,000 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Reranker v3
|
600 | - | 6,000,000 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |