Symptom: High CPU usage

Stack

Elasticsearch uses thread pools to manage node CPU and JVM resources for concurrent operations. The thread pools are portioned different numbers of threads, frequently based off of the total processors allocated to the node. This helps the node remain responsive while processing either expensive tasks or a task queue backlog. Elasticsearch rejects requests related to a thread pool while its queue is saturated.

An individual task can spawn work on multiple node threads, frequently within these designated thread pools. It is normal for an individual thread to saturate its CPU usage. A thread reporting CPU saturation could reflect either the thread spending its time processing an ask from an individual expensive task or the thread staying busy due to processing asks from multiple tasks. The hot threads report shows a snapshot of Java threads across a time interval. Therefore, the hot threads cannot be directly mapped to any given node task.

A node can temporarily saturate all of the CPU threads allocated to it. It's unusual for this state to be ongoing for an extended period. It might suggest that the node is:

sized disproportionately to its data tier peers.
receiving a volume of requests above its workload capability, for example if the node is sized below the minimum recommendations.
processing an expensive task.

To mitigate performance outages, we default recommend pulling an Elasticsearch diagnostic for post-mortem but trying to resolve using scaling.

Refer to the sections below to troubleshoot degraded CPU performance.

Diagnose high CPU usage

Check CPU usage

To check the CPU usage per node, use the cat nodes API:

						GET _cat/nodes?v=true&s=cpu:desc&h=name,role,master,cpu,load*,allocated_processors
		
	

The reported metrics are:

cpu: the instantaneous percentage of system CPU usage
load_1m, load_5m, and load_15m: the average amount of processes waiting for the designated time interval
allocated_processors: number of processors allocated to the node Stack Planned

For more detail, refer to the node statistics API documentation.

These alerting thresholds for these metrics depend on your team's workload-vs-duration needs. However, as a general start point baseline, you might consider investigating if:

(Recommended) CPU usage remains elevated above 95% for an extended interval.
Load average divided by the node's allocated processors is elevated. This metric by itself is insufficient as a gauge and should be considered alongside elevated response times, as it otherwise might reflect normal background I/O.

If CPU usage is deemed concerning, we recommend checking this output for traffic patterns either segmented by or hot spotted in the columns role and master. CPU issues spanning an entire data tier suggest a configuration issue or the tier being undersized. CPU issues spanning a subset of nodes within one or more data tiers suggest hot spotting tasks.

Check hot threads

High CPU usage frequently correlates to a long-running task or a backlog of tasks. When a node is reporting elevated CPU usage, to correlate the thread to a task use the nodes hot threads API to check for resource-intensive threads running on it.

				GET _nodes/hot_threads

This API returns a snapshot of hot Java threads. As a simplified example, the response output might appear like the following:

		::: {instance-0000000001}{9fVI1XoXQJCgHwsOPlVEig}{RrJGwEaESRmNs75Gjs1SOg}{instance-0000000001}{10.42.9.84}{10.42.9.84:19058}{himrst}{9.3.0}{7000099-8525000}{region=unknown-region, server_name=instance-0000000001.b84ab96b481f43d791a1a73477a10d40, xpack.installed=true, transform.config_version=10.0.0, ml.config_version=12.0.0, data=hot, logical_availability_zone=zone-1, availability_zone=us-central1-a, instance_configuration=gcp.es.datahot.n2.68x10x45}
   Hot threads at 2025-05-14T17:59:30.199Z, interval=500ms, busiestThreads=10000, ignoreIdleThreads=true:

   88.5% [cpu=88.5%, other=0.0%] (442.5ms out of 500ms) cpu usage by thread '[write]'
     8/10 snapshots sharing following 29 elements
       com.fasterxml.jackson.dataformat.smile@2.17.2/com.fasterxml.jackson.dataformat.smile.SmileParser.nextToken(SmileParser.java:434)
       org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.doAdd(LocalBulk.java:69)
       # ...
     2/10 snapshots sharing following 37 elements
       app/org.elasticsearch.xcontent/org.elasticsearch.xcontent.support.filtering.FilterPath$FilterPathBuilder.insertNode(FilterPath.java:172)
       # ...
		
	

The response output is formatted as follows:

		::: {NAME}{ID}{...}{HOST_NAME}{ADDRESS}{...}{ROLES}{VERSION}{...}{ATTRIBUTES}
   Hot threads at TIMESTAMP, interval=INTERVAL_FROM_API, busiestThreads=THREADS_FROM_API, ignoreIdleThreads=IDLE_FROM_API:

   TOTAL_CPU% [cpu=ELASTIC_CPU%, other=OTHER_CPU%] (Xms out of INTERVAL_FROM_API) cpu usage by thread 'THREAD'
     X/... snapshots sharing following X elements
       STACKTRACE_SAMPLE
       # ...
     X/... snapshots sharing following X elements
       STACKTRACE_SAMPLE
       # ...
		
	

Three measures of CPU time are reported in the API output:

TOTAL_CPU: the total CPU used by the CPU thread (either by Elasticsearch or the operating system)
ELASTIC_CPU: the CPU available to Elasticsearch and used by Elasticsearch
OTHER_CPU: a miscellaneous bucket for disk/network IO or garbage collection (GC)

Although ELASTIC_CPU is the main driver of elevated TOTAL_CPU, you should also investigate the STACKTRACE_SAMPLE. These lines frequently emit Elasticsearch loggers but might also surface non-Elasticsearch processes. Common examples of performance log entries include:

org.elasticsearch.action.search or org.elasticsearch.search is a running search
org.elasticsearch.cluster.metadata.Metadata.findAliases is an alias look-up/resolver
org.elasticsearch.common.regex is custom Regex code
org.elasticsearch.grok is custom Grok code
org.elasticsearch.index.fielddata.ordinals.GlobalOrdinalsBuilder.build is building global ordinals
org.elasticsearch.ingest.Pipeline or org.elasticsearch.ingest.CompoundProcessor is an ingest pipeline
org.elasticsearch.xpack.core.esql or org.elasticsearch.xpack.esql is a running ES|QL query

If your team would like assistance correlating hot threads and node tasks, kindly {capture your {es}} diagnostics when you contact us.

Check garbage collection

High CPU usage is often caused by excessive JVM garbage collection (GC) activity. This excessive GC typically arises from configuration problems or inefficient queries causing increased heap memory usage.

For troubleshooting information, refer to high JVM memory pressure.

Monitor CPU usage

Simplify monitoring with AutoOps

AutoOps is a monitoring tool that simplifies cluster management through performance recommendations, resource utilization visibility, and real-time issue detection with resolution paths. Learn more about AutoOps.

To track CPU usage over time, we recommend enabling monitoring:

ECH ECE

(Recommend) Enable AutoOps
Enable logs and metrics. When logs and metrics are enabled, monitoring information is visible on Kibana's Stack Monitoring page.

You can also enable the CPU usage threshold alert to be notified about potential issues through email.
From your deployment menu, view the Performance page. On this page, you can view two key metrics:
- CPU usage: Your deployment’s CPU usage, represented as a percentage.
- CPU credits: Your remaining CPU credits, measured in seconds of CPU time.

Elastic Cloud Hosted grants CPU credits per deployment to provide smaller clusters with performance boosts when needed. High CPU usage can deplete these credits, which might lead to performance degradation and increased cluster response times.

ECK Self-managed

(Recommend) Enable AutoOps
Enable Elasticsearch monitoring. When logs and metrics are enabled, monitoring information is visible on Kibana's Stack Monitoring page.

You can also enable the CPU usage threshold alert to be notified about potential issues through email.

You might also consider enabling slow logs to review as part of the task backlog.

Reduce CPU usage

High CPU usage usually correlates to live expensive tasks or back-logged tasks running against the node. The following tips outline common causes and solutions for heightened CPU usage occurring even during periods of low or no traffic.

Oversharding

Oversharding occurs when a cluster has too many shards, often times caused by shards being smaller than optimal. We recommend the following best practices:

While Elasticsearch doesn’t have a strict minimum shard size, an excessive number of small shards can negatively impact performance. Each shard consumes cluster resources because Elasticsearch must maintain metadata and manage shard states across all nodes.

If you have too many small shards, you can address this by doing the following:

Removing empty or unused indices.
Deleting or closing indices containing outdated or unnecessary data.
Reindexing smaller shards into fewer, larger shards to optimize cluster performance.

If your shards are sized correctly but you are still experiencing oversharding, creating a more aggressive index lifecycle management strategy or deleting old indices can help reduce the number of shards.

Overrode allocated processors

By default, Elasticsearch allocates processors equal to the number reported available by the operating system. You can override this behaviour by adjusting the value of node.processors, but this advanced setting should be configured only after you've performed load testing.

Elastic Cloud Hosted supports vCPU boosting which should be relied on only for short bursting traffic and not for normal workload traffic.