Loading

Total number of shards per node has been reached

Elasticsearch takes advantage of all available resources by distributing data (index shards) among the cluster nodes.

You can influence the data distribution by configuring the cluster.routing.allocation.total_shards_per_node dynamic cluster setting to restrict the number of shards that can be hosted on a single node in the cluster.

In earlier Elasticsearch versions, cluster.routing.allocation.total_shards_per_node is set to 1000. Reaching that limit causes the following error: Total number of shards per node has been reached and requires adjusting this setting or reducing the number of shards. In Elasticsearch 9.x, this setting is not configured by default, which means there is no upper bound on the number of shards per node unless the setting is explicitly defined.

Various configurations limiting how many shards can be hosted on a single node can lead to shards being unassigned, because the cluster does not have enough nodes to satisfy the configuration. To ensure that each node carries a reasonable shard load, you might need to resize your deployment.

Follow these steps to resolve this issue:

  1. Check and adjust the cluster shard limit to determine the current value and increase it if needed.
  2. Determine which data tier needs more capacity to identify the tier where shards need to be allocated.
  3. Resize your deployment to add capacity and accommodate additional shards.

The cluster.routing.allocation.total_shards_per_node setting controls the maximum number of shards that can be allocated to each node in a cluster. When this limit is reached, Elasticsearch cannot assign new shards to that node, leading to unassigned shards in your cluster.

By checking the current value and increasing it, you allow more shards to be collocated on each node, which might resolve the allocation issue without adding more capacity to your cluster.

You can run the following steps using either API console or direct Elasticsearch API calls.

Use the get cluster-wide settings API to inspect the current value of cluster.routing.allocation.total_shards_per_node:

				GET /_cluster/settings?flat_settings
		

The response looks like this:

{
  "persistent": {
    "cluster.routing.allocation.total_shards_per_node": "300"
  },
  "transient": {}
}
		
  1. Represents the current configured value for the total number of shards that can reside on one node in the cluster. If the value is null or absent, no explicit limit is configured.

Use the update the cluster settings API to increase the value to a higher number that accommodates your workload:

				PUT _cluster/settings
					{
  "persistent" : {
    "cluster.routing.allocation.total_shards_per_node" : 400
  }
}
		
  1. The new value for the system-wide total_shards_per_node configuration is increased from the previous value of 300 to 400. The total_shards_per_node configuration can also be set to null, which represents no upper bound with regards to how many shards can be collocated on one node in the system.

If increasing the cluster shard limit alone doesn't resolve the issue, or if you want to distribute shards more evenly, you need to identify which data tier requires additional capacity.

Use the get index settings API to retrieve the configured value for the index.routing.allocation.include._tier_preference setting:

				GET /my-index-000001/_settings/index.routing.allocation.include._tier_preference?flat_settings
		

The response looks like this:

{
  "my-index-000001": {
    "settings": {
      "index.routing.allocation.include._tier_preference": "data_warm,data_hot"
    }
  }
}
		
  1. Represents a comma-separated list of data tier node roles this index is allowed to be allocated on. The first tier in the list has the highest priority and is the tier the index is targeting. In this example, the tier preference is data_warm,data_hot, so the index is targeting the warm tier. If the warm tier lacks capacity, the index will fall back to the data_hot tier.

After you've identified the tier that needs more capacity, you can resize your deployment to distribute the shard load and allow previously unassigned shards to be allocated.

Warning

In ECE, resizing is limited by your allocator capacity.

To resize your deployment and increase its capacity by expanding a data tier or adding a new one, use the following options:

Option 1: Configure Autoscaling

  1. Log in to the Elastic Cloud console or ECE Cloud UI.
  2. On the home page, find your deployment and select Manage.
  3. Go to Actions > Edit deployment and check that autoscaling is enabled. Adjust the Enable Autoscaling for dropdown menu as needed and select Save.
  4. If autoscaling is successful, the cluster returns to a healthy status. If the cluster is still out of disk, check if autoscaling has reached its set limits and update your autoscaling settings.

Option 2: Configure deployment size and tiers

You can increase the deployment capacity by editing the deployment and adjusting the size of the existing data tiers or adding new ones.

  1. In Kibana, open your deployment’s navigation menu (placed under the Elastic logo in the upper left corner) and go to Manage this deployment.
  2. From the right hand side, click to expand the Manage dropdown button and select Edit deployment from the list of options.
  3. On the Edit page, increase capacity for the data tier you identified earlier by either adding a new tier with + Add capacity or adjusting the size of an existing one. Choose the desired size and availability zones for that tier.
  4. Navigate to the bottom of the page and click the Save button.

Option 3: Change the hardware profiles/deployment templates

You can change the hardware profile for Elastic Cloud Hosted deployments or deployment template of the Elastic Cloud Enterprise cluster to one with a higher disk-to-memory ratio.

Option 4: Override disk quota

Elastic Cloud Enterprise administrators can temporarily override the disk quota of Elasticsearch nodes in real time as explained in Resource overrides. We strongly recommend making this change only under the guidance of Elastic Support, and only as a temporary measure or for troubleshooting purposes.

To increase the data node capacity in your cluster, you can add more nodes to the cluster and assign the index’s target tier node role to the new nodes, or increase the disk capacity of existing nodes. Disk expansion procedures depend on your operating system and storage infrastructure and are outside the scope of Elastic support. In practice, this is often achieved by removing a node from the cluster and reinstalling it with a larger disk.

To increase the capacity of the data nodes in your Elastic Cloud on Kubernetes cluster, you can either add more data nodes to the desired tier, or increase the storage size of existing nodes.

Option 1: Add more data nodes

  1. Update the count field in your data node nodeSets to add more nodes:

    apiVersion: elasticsearch.k8s.elastic.co/v1
    kind: Elasticsearch
    metadata:
      name: quickstart
    spec:
      version: 9.3.0
      nodeSets:
      - name: data-nodes
        count: 5
        config:
          node.roles: ["data"]
        volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi
    		
    1. Increase from previous count
  2. Apply the changes:

    kubectl apply -f your-elasticsearch-manifest.yaml
    		

    ECK automatically creates the new nodes with a data node role and Elasticsearch will relocate shards to balance the load.

    You can monitor the progress using:

    				GET /_cat/shards?v&h=state,node&s=state
    		

Option 2: Increase storage size of existing nodes

  1. If your storage class supports volume expansion, you can increase the storage size in the volumeClaimTemplates:

    apiVersion: elasticsearch.k8s.elastic.co/v1
    kind: Elasticsearch
    metadata:
      name: quickstart
    spec:
      version: 9.3.0
      nodeSets:
      - name: data-nodes
        count: 3
        config:
          node.roles: ["data"]
        volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 200Gi
    		
    1. Increased from previous size
  2. Apply the changes. If the volume driver supports ExpandInUsePersistentVolumes, the filesystem will be resized online without restarting Elasticsearch. Otherwise, you might need to manually delete the Pods after the resize so they can be recreated with the expanded filesystem.

For more information, refer to Update your deployments and Volume claim templates > Updating the volume claim settings.

Simplify monitoring with AutoOps

AutoOps is a monitoring tool that simplifies cluster management through performance recommendations, resource utilization visibility, and real-time issue detection with resolution paths. Learn more about AutoOps.