Troubleshoot an unbalanced cluster

Simplify monitoring with AutoOps

AutoOps is a monitoring tool that simplifies cluster management through performance recommendations, resource utilization visibility, and real-time issue detection with resolution paths. Learn more about AutoOps.

Elasticsearch assumes all nodes within a data tier share the same hardware profile. This enables its Allocation feature to balance distributing index shards across target nodes.

Allocation sequentially computes from cluster-level settings and filters, then an individual node's disk watermark, and finally from shard awareness. Within these logical prerequisites, such as data tiers, Elasticsearch balances shards to achieve a good compromise between:

current shard count
forecasted disk usage
write load (for indices in data streams)

There is no guarantee that individual components will be evenly spread across the nodes. This could happen if some nodes have fewer shards, or are using less disk space, but are assigned shards with higher write loads.

When rebalancing shards, Elasticsearch does not consider the amount or complexity of search queries. This is indirectly achieved by balancing shard count and disk usage. It also does not consider the following factors:

current CPU usage
current JVM memory pressure
task queue backlog
which nodes coordinate the related tasks
which node is elected as the master node
write load of aliases or standalone indices

Check the cluster balance

To check the cluster's balance, use the cat allocation command to list workloads on each node:

				GET /_cat/allocation?v

The API returns the following response:

		shards shards.undesired write_load.forecast disk.indices.forecast disk.indices disk.used disk.avail disk.total disk.percent host          ip            node      node.role
              0  3.8590562438622698               744.1gb      496.1gb   523.2gb      1.6tb      2.1tb           23 10.224.62.48  10.224.62.48  hot-09    hirs
              0   4.020483253384615                 407gb      237.2gb   256.2gb      1.8tb      2.1tb           11 10.224.62.92  10.224.62.92  hot-09    hirs
              1                 0.0                 2.6tb        1.8tb     1.8tb     10.1tb     11.9tb           15 10.224.62.119 10.224.62.119 cold-07   c
              3                 0.0                 2.6tb        1.7tb     1.8tb     10.1tb     11.9tb           15 10.224.141.89 10.224.141.89 cold-05   c
		
	

This response contains the following information that influences balancing:

shards is the current number of shards allocated to the node
shards.undesired is the number of shards that needs to be moved to other nodes to finish balancing
disk.indices.forecast is the expected disk usage according to projected shard growth
write_load.forecast is the projected total write load associated with this node

A cluster is considered balanced when all shards are in their desired locations, which means that no further shard movements are planned (all shards.undesired values are equal to 0).

Rebalancing behavior

Some operations, such as node restarting, decommissioning, or changing cluster allocation settings, are disruptive and might require multiple shards to move in order to rebalance the cluster.

Shard movement order is not deterministic and mostly determined by the source and target node readiness to move a shard. While rebalancing is in progress some nodes might appear busier than others.

When a shard is allocated to an undesired node it uses the resources of the current node instead of the target. This might cause a disk or CPU hotspot when multiple shards reside on the current node that have not been moved to their corresponding targets yet.

You can monitor shard migrations using the cat recovery command, along with their migrated bp bytes percent of tb total bytes:

						GET _cat/recovery?v=true&expand_wildcards=all&active_only=true&h=time,tb,bp,top,ty,st,snode,tnode,idx,sh&s=time:desc
		
	

Divergent balances

If a cluster takes a long time to finish rebalancing, you might find the following log entries:

		[WARN][o.e.c.r.a.a.DesiredBalanceReconciler] [10%] of assigned shards (10/100) are not on their desired nodes, which exceeds the warn threshold of [10%]
		
	

This is not concerning as long as the number of such shards is decreasing and this warning appears occasionally, for example after rolling restarts or changing allocation settings.

If the cluster has this warning repeatedly for an extended period of time (multiple hours), it is possible that the desired balance is diverging too far from the current state.

If so, you should:

Increase the cluster.routing.allocation.balance.threshold setting to reduce the sensitivity of the algorithm that tries to level up the shard count and disk usage within the cluster.
Reset the desired balance using the following API call:
```
				DELETE /_internal/desired_balance
		
```
Note

If your deployment runs on an orchestrating platform such as Elastic Cloud Hosted, Elastic Cloud Enterprise, or Elastic Cloud on Kubernetes, the desired balance can only be reset by a user with operator privileges. Refer to operator privileges for more information.