Loading

Rate limits

This page lists the rate limits that apply to Elastic Inference Service (EIS) models.

Exceeding a limit results in HTTP 429 responses from the server until the sliding window moves on further and parts of the limit resets.

Model Request/minute Tokens/minute (ingest) Tokens/minute (search) Notes
Elastic Managed LLMs 2000 - - No rate limit on tokens
ELSER 6,000 6,000,000 600,000 Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.
Jina Embeddings v5 Nano 6,000 6,000,000 600,000 Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.
Jina Embeddings v5 Small 6,000 6,000,000 600,000 Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.
Jina Embeddings v3 6,000 6,000,000 600,000 Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.
Jina Embeddings v5 (Small) 6,000 6,000,000 600,000 Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.
Jina Embeddings v5 (Nano) 6,000 6,000,000 600,000 Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.
Jina Reranker v2 600 - 6,000,000 Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.
Jina Reranker v3 600 - 6,000,000 Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.