﻿---
title: Diagnose unavailable nodes
description: This section provides a list of common symptoms and possible actions that you can take to resolve issues when one or more nodes become unhealthy or unavailable...
url: https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/monitoring/unavailable-nodes
products:
  - Elastic Cloud Hosted
applies_to:
  - Elastic Cloud Hosted: Generally available
---

# Diagnose unavailable nodes
This section provides a list of common symptoms and possible actions that you can take to resolve issues when one or more nodes become unhealthy or unavailable. This guide is particularly useful if you are not [shipping your logs and metrics](https://www.elastic.co/elastic/docs-builder/docs/3016/deploy-manage/monitor/stack-monitoring/ece-ech-stack-monitoring) to a dedicated monitoring cluster.
**What are the symptoms?**
- [Full disk on single-node deployment](#ec-single-node-deployment-disk-used)
- [Full disk on multiple-nodes deployment](#ec-multiple-node-deployment-disk-used)
- [JVM heap usage exceeds the allowed threshold on master nodes](#ec-jvm-heap-usage-exceed-allowed-threshold)
- [CPU usage exceeds the allowed threshold on master nodes](#ec-cpu-usage-exceed-allowed-threshold)
- [Some nodes are unavailable and are displayed as missing](#ec-nodes-unavailable-missing)

**What is the impact?**
- Only some search results are successful
- Ingesting, updating, and deleting data do not work
- Most Elasticsearch API requests fail

<note>
  Some actions described here, such as stopping indexing or Machine Learning jobs, are temporary remediations intended to get your cluster into a state where you can make configuration changes to resolve the issue.
</note>

For production deployments, we recommend setting up a dedicated monitoring cluster to collect metrics and logs, troubleshooting views, and cluster alerts.
If your issue is not addressed here, then [contact Elastic support for help](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot).
<admonition title="Simplify monitoring with AutoOps">
  AutoOps is a [monitoring](https://www.elastic.co/elastic/docs-builder/docs/3016/deploy-manage/monitor) tool that simplifies cluster management through performance recommendations, resource utilization visibility, and real-time issue detection with resolution paths. Learn more about [AutoOps](https://www.elastic.co/elastic/docs-builder/docs/3016/deploy-manage/monitor/autoops).
</admonition>


## Full disk on single-node deployment

**Health check**
1. Log in to the [Elastic Cloud Console](https://cloud.elastic.co?page=docs&placement=docs-body).
2. Click the **Manage** link corresponding to the deployment that you want to manage.
3. On your deployment page, scroll down to **Instances** and check if the disk allocation for your Elasticsearch instance is over 90%.
   ![Full disk on single-node deployment](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/images/cloud-ec-full-disk-single-node.png)

**Possible cause**
- The available storage is insufficient for the amount of ingested data.

**Resolution**
- [Delete unused data](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-delete).

<note>
  You can delete unused data by running either:
  - API calls using the [Kibana console](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/query-filter/tools/console), if available
  - direct [Elasticsearch API](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/elasticsearch/rest-apis) calls, when Elasticsearch has an elected quorum.
</note>

- Increase the disk size on your Hot data and Content tier (scale up).

<note>
  If your Elasticsearch cluster is unhealthy and reports a status of red, then increasing the disk size of your Hot data and Content tier may fail. You might need to delete some data so the configuration can be edited. If you want to increase your disk size without deleting data, then [reach out to Elastic support](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot) and we will assist you with scaling up.
</note>

**Preventions**
- Increase the disk size on your Hot data and Content tier (scale up).
  From your deployment menu, go to the **Edit** page and increase the **Size per zone** for your Hot data and Content tiers.
  ![Increase size per zone](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/images/cloud-ec-increase-size-per-zone.png)
- Enable [autoscaling](https://www.elastic.co/elastic/docs-builder/docs/3016/deploy-manage/autoscaling) to grow your cluster automatically when it runs out of space.
- Configure [ILM](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/lifecycle/index-lifecycle-management) policies to automatically delete unused data.
- Add nodes to your Elasticsearch cluster and enable [data tiers](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/lifecycle/data-tiers) to move older data that you don’t query often to more cost-effective storage.


## Full disk on multiple-nodes deployment

**Health check**
1. Log in to the [Elastic Cloud Console](https://cloud.elastic.co?page=docs&placement=docs-body).
2. From the Elasticsearch Service panel, click the **Quick link** icon corresponding to the deployment that you want to manage.
   ![Quick link to the deployment page](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/images/cloud-ec-quick-link-to-deployment.png)
3. On your deployment page, scroll down to **Instances** and check if the disk allocation for any of your Elasticsearch instances is over 90%.
   ![Full disk on multiple-nodes deployment](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/images/cloud-ec-full-disk-multiple-nodes.png)

**Possible cause**
- The available storage is insufficient for the amount of ingested data.

**Resolution**
- [Delete unused data](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-delete).

<note>
  You can delete unused data by running either:
  - API calls using the [Kibana console](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/query-filter/tools/console), if available
  - direct [Elasticsearch API](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/elasticsearch/rest-apis) calls, when Elasticsearch has an elected quorum.
</note>

- Increase the disk size (scale up).

<note>
  If your Elasticsearch cluster is unhealthy and reports a status of red, the scale up configuration change to increasing disk size on the affected data tiers may fail. You might need to delete some data so the configuration can be edited. If you want to increase your disk size without deleting data, then [reach out to Elastic support](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot) and we will assist you with scaling up.
</note>

**Preventions**
- Increase the disk size (scale up).
  1. On your deployment page, scroll down to **Instances** and identify the node attribute of the instances that are running out of disk space.
   ![Instance node attribute](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/images/cloud-ec-node-attribute.png)
2. Use the node types identified at step 1 to find out the corresponding data tier.
   ![Node type and corresponding attribute](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/images/cloud-ec-node-types-data-tiers.png)
3. From your deployment menu, go to the **Edit** page and increase the **Size per zone** for the data tiers identified at step 2.
   ![Increase size per zone](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/images/cloud-ec-increase-size-per-zone.png)
- Enable [autoscaling](https://www.elastic.co/elastic/docs-builder/docs/3016/deploy-manage/autoscaling) to grow your cluster automatically when it runs out of space.
- Configure [ILM](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/lifecycle/index-lifecycle-management) policies to automatically delete unused data.
- Enable [data tiers](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/lifecycle/data-tiers) to move older data that you don’t query often to more cost-effective storage.


## JVM heap usage exceeds the allowed threshold on master nodes

**Health check**
1. Log in to the [Elastic Cloud Console](https://cloud.elastic.co?page=docs&placement=docs-body).
2. From the Elasticsearch Service panel, click the **Quick link** icon corresponding to the deployment that you want to manage.
   ![Quick link to the deployment page](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/images/cloud-ec-quick-link-to-deployment.png)
3. On your deployment page, scroll down to **Instances** and check if the JVM memory pressure for your Elasticsearch instances is high.
   ![Deployment instances configuration](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/images/cloud-ec-deployment-instances-config.png)

**Possible causes**
- The master node is overwhelmed by a large number of snapshots or shards.
  - External tasks initiated by clients
  - Index, search, update
- Frequent template updates due to the Beats configuration
- Internal tasks initiated by users
  - Machine Learning jobs, watches, monitoring, ingest pipeline
- Internal tasks initiated by Elasticsearch
  - Nodes joining and leaving due to hardware failures
- Shard allocation due to nodes joining and leaving
- Configuration of [ILM](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/lifecycle/index-lifecycle-management) policies.

**Resolutions**
- If the master node is overwhelmed by external tasks initiated by clients:
  Investigate which clients might be overwhelming the cluster and reduce the request rate or pause ingesting, searching, or updating from the client. If you are using Beats, temporarily stop the Beat that’s overwhelming the cluster to avoid frequent template updates.
- If the master node is overwhelmed by internal tasks initiated by users:
  - Check [cluster-level pending tasks](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-pending-tasks).
- Reduce the number of Machine Learning jobs or watches.
- Change the number of ingest pipelines or processors to use less memory.
- If the master node is overwhelmed by internal tasks initiated by Elasticsearch:
  - For nodes joining and leaving, this should resolve itself. If increasing the master nodes size doesn’t resolve the issue, contact support.
- For shard allocation, inspect the progress of shards recovery.
  - Make sure `indices.recovery.max_concurrent_operations` is not aggressive, which could cause the master to be unavailable.
- Make sure `indices.recovery.max_bytes_per_sec` is set adequately to avoid impact on ingest and search workload.
- Check [ILM](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/lifecycle/index-lifecycle-management) policies to avoid index rollover and relocate actions that are concurrent and aggressive.
- If the master node is overwhelmed by a large number of snapshots, reduce the number of snapshots in the repo.
- If the master node is overwhelmed by a large number of shards, delete unneeded indices and shrink read-only indices to fewer shards. For more information, check [Reduce a cluster’s shard count](/elastic/docs-builder/docs/3016/deploy-manage/production-guidance/optimize-performance/size-shards#reduce-cluster-shard-count).


## CPU usage exceeds the allowed threshold on master nodes

**Health check**
By default, the allowed CPU usage threshold is set at 85%.
1. Log in to the [Elastic Cloud Console](https://cloud.elastic.co?page=docs&placement=docs-body).
2. From the Elasticsearch Service panel, click the **Quick link** icon corresponding to the deployment that you want to manage.
   ![Quick link to the deployment page](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/images/cloud-ec-quick-link-to-deployment.png)
3. Identify the IDs of your master nodes. On your deployment page, scroll down to **Instances** and filter your instance configuration by master. The IDs of your master nodes are in the title. In this example, the IDs are 21, 26 and 27:
   ![Instances configuration filtered by master nodes ID](https://www.elastic.co/elastic/docs-builder/docs/3016/troubleshoot/images/cloud-ec-instances-filtered-by-master-id.png)
   <note>
   The name of the instance configuration might differ depending on the cloud provider.
   </note>
4. Navigate to the **Performance** page of your deployment. Check if the CPU usage of your master nodes exceeds 85%. Your master node has the format `instance-<ID>``, where `<ID>`` is the ID of the master node.

If you use [Stack Monitoring](https://www.elastic.co/elastic/docs-builder/docs/3016/deploy-manage/monitor/monitoring-data/visualizing-monitoring-data), open Kibana from your deployment page and select **Stack Monitoring** from the menu or the search bar.
<note>
  Stack Monitoring comes with out-of-the-box rules, but you need to enable them when prompted.
</note>

**Possible causes**
- The master node is overwhelmed by a large number of snapshots or shards.
- The memory available on the master node is overwhelmed by these tasks:
  - External tasks initiated by clients
  - Index, search, update
- Frequent template updates due to the Beats configuration
- Internal tasks initiated by users
  - Machine Learning jobs, watches, monitoring, ingest pipelines
- Internal tasks initiated by Elasticsearch
  - Nodes joining and leaving due to hardware failures
- Shard allocation due to nodes joining and leaving
- Configuration of [ILM](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/lifecycle/index-lifecycle-management) policies.

**Resolutions**
- Navigate to the **Edit** page of your deployment and increase the master node size.
- [Upgrade the cluster](https://www.elastic.co/elastic/docs-builder/docs/3016/deploy-manage/upgrade/deployment-or-cluster) to the latest version.
- If the master node is overwhelmed by external tasks initiated by clients:
  - Reduce the request rate or pause ingesting, searching, or updating from the client.
- Enable ingest and search-based autoscaling.
- Stop Beats to avoid frequent template updates.
- If the master node is overwhelmed by internal tasks initiated by users:
  - Check [cluster-level pending tasks](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-pending-tasks).
- Reduce the number of Machine Learning jobs or watches.
- Change the number of ingest pipelines or processors to use less memory.
- If the master node is overwhelmed by internal tasks initiated by Elasticsearch:
  - For nodes joining and leaving, this should resolve itself. If increasing the master nodes size doesn’t resolve the issue, contact support.
- For shard allocation, inspect the progress of shards recovery. If there’s no progress, contact support.
  - Make sure `indices.recovery.max_concurrent_operations` is not aggressive, which could cause the master to be unavailable.
- Make sure `indices.recovery.max_bytes_per_sec` is set adequately to avoid impact on ingest and search workload.
- Check [ILM](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/lifecycle/index-lifecycle-management) policies to avoid index rollover and relocate actions that are concurrent and aggressive.
- If the master node is overwhelmed by a large number of snapshots, reduce the number of snapshots in the repo.
- If the master node is overwhelmed by a large number of shards, reduce the number of shards on the node. For more information, check [Size your shards](https://www.elastic.co/elastic/docs-builder/docs/3016/deploy-manage/production-guidance/optimize-performance/size-shards).


## Some nodes are unavailable and are displayed as missing

**Health check**
- Use the [Metrics inventory](https://www.elastic.co/elastic/docs-builder/docs/3016/solutions/observability/infra-and-hosts/analyze-infrastructure-host-metrics) to identify unavailable or unhealthy nodes. If the number of minimum master nodes is down, Elasticsearch is not available.

**Possible causes**
- Hardware issue.
- Routing has stopped because of a previous ES configuration failure.
- Disk/memory/CPU are saturated.
- The network is saturated or disconnected.
- Nodes are unable to join.

**Resolutions**
- Hardware issue: Any unhealthy hardware detected by the platform is automatically vacated within the hour. If this doesn’t happen, contact support.
- Routing stopped: A failed Elasticsearch configuration might stop the nodes routing. Restart the routing manually to bring the node back to health.
- Disk/memory/CPU saturated:
  - [Delete unused data](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-delete).
  <note>
  You can delete unused data by running either:
  - API calls using the [Kibana console](https://www.elastic.co/elastic/docs-builder/docs/3016/explore-analyze/query-filter/tools/console), if available
  - direct [Elasticsearch API](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/elasticsearch/rest-apis) calls, when Elasticsearch has an elected quorum.
  </note>
- Increase disk size.
- [Enable autoscaling](https://www.elastic.co/elastic/docs-builder/docs/3016/deploy-manage/autoscaling).
- Configuration of [ILM](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/lifecycle/index-lifecycle-management) policies.
- [Manage data tiers](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/lifecycle/data-tiers).
- Network saturated or disconnected: Contact support.
- Nodes unable to join: Fix the Elasticsearch configuration.
- Nodes unable to join: Contact support.