Rate limiting

The Elastic Cloud Managed OTLP Endpoint uses queue-based rate limiting to manage data ingestion. Rate limiting occurs when data is received faster than the backend can process and index it. If the queue backlog grows beyond capacity, the endpoint responds with HTTP 429 errors until the backlog is consumed.

How rate limiting works

Rate limiting behavior differs by deployment type:

Elastic Cloud Hosted: Rate limits depend on your Elasticsearch cluster capacity. If your cluster can't keep up with incoming data, the endpoint starts rejecting requests with 429 errors.
Elastic Cloud Serverless: Elastic manages scaling automatically. Rate limiting is rare and typically indicates a temporary backend scaling event.

Identifying rate limiting

When rate limiting occurs, the Elastic Cloud Managed OTLP Endpoint responds with an HTTP 429 Too Many Requests status code. A log message similar to this appears in the OpenTelemetry Collector's output:

		{
  "code": 8,
  "message": "error exporting items, request to <ingest endpoint> responded with HTTP Status Code 429"
}
		
	

For troubleshooting steps, refer to Error: too many requests.

Resolving rate limiting

Elastic Cloud Hosted deployments

For Elastic Cloud Hosted deployments, 429 errors typically indicate that your Elasticsearch cluster is undersized for the current data volume. Use AutoOps to check CPU utilization, index queue depth, and node load to confirm whether your cluster is under-resourced.

If metrics confirm the cluster needs more capacity, scale your deployment:

Once the queue backlog is consumed and Elasticsearch capacity matches the data volume, requests are automatically accepted again.

Elastic Cloud Serverless deployments

For Elastic Cloud Serverless projects, Elastic manages backend scaling automatically. If you experience persistent 429 errors, contact Elastic Support.