﻿---
title: Cluster fault detection
description: The elected master periodically checks each of the nodes in the cluster to ensure that they are still connected and healthy. Each node in the cluster...
url: https://www.elastic.co/elastic/docs-builder/docs/3028/deploy-manage/distributed-architecture/discovery-cluster-formation/cluster-fault-detection
products:
  - Elasticsearch
applies_to:
  - Elastic Stack: Generally available
---

# Cluster fault detection
The elected master periodically checks each of the nodes in the cluster to ensure that they are still connected and healthy. Each node in the cluster also periodically checks the health of the elected master. These checks are known respectively as *follower checks* and *leader checks*.
Elasticsearch allows these checks to occasionally fail or timeout without taking any action. It considers a node to be faulty only after a number of consecutive checks have failed. You can control fault detection behavior with [`cluster.fault_detection.*` settings](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3028/reference/elasticsearch/configuration-reference/discovery-cluster-formation-settings).
If the elected master detects that a node has disconnected, however, this situation is treated as an immediate failure. The master bypasses the timeout and retry setting values and attempts to remove the node from the cluster. Similarly, if a node detects that the elected master has disconnected, this situation is treated as an immediate failure. The node bypasses the timeout and retry settings and restarts its discovery phase to try and find or elect a new master.

Additionally, each node periodically verifies that its data path is healthy by writing a small file to disk and then deleting it again. If a node discovers its data path is unhealthy then it is removed from the cluster until the data path recovers. You can control this behavior with the [`monitor.fs.health` settings](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3028/reference/elasticsearch/configuration-reference/discovery-cluster-formation-settings).

The elected master node will also remove nodes from the cluster if nodes are unable to apply an updated cluster state within a reasonable time. The timeout defaults to 2 minutes starting from the beginning of the cluster state update. Refer to [Publishing the cluster state](/elastic/docs-builder/docs/3028/deploy-manage/distributed-architecture/discovery-cluster-formation/cluster-state-overview#cluster-state-publishing) for a more detailed description.

## Troubleshooting an unstable cluster

See [*Troubleshooting an unstable cluster*](https://www.elastic.co/elastic/docs-builder/docs/3028/troubleshoot/elasticsearch/troubleshooting-unstable-cluster).

#### Diagnosing `disconnected` nodes

See [Diagnosing `disconnected` nodes](/elastic/docs-builder/docs/3028/troubleshoot/elasticsearch/troubleshooting-unstable-cluster#troubleshooting-unstable-cluster-disconnected).

#### Diagnosing `lagging` nodes

See [Diagnosing `lagging` nodes](/elastic/docs-builder/docs/3028/troubleshoot/elasticsearch/troubleshooting-unstable-cluster#troubleshooting-unstable-cluster-lagging).

#### Diagnosing `follower check retry count exceeded` nodes

See [Diagnosing `follower check retry count exceeded` nodes](/elastic/docs-builder/docs/3028/troubleshoot/elasticsearch/troubleshooting-unstable-cluster#troubleshooting-unstable-cluster-follower-check).

#### Diagnosing `ShardLockObtainFailedException` failures

See [Diagnosing `ShardLockObtainFailedException` failures](/elastic/docs-builder/docs/3028/troubleshoot/elasticsearch/troubleshooting-unstable-cluster#troubleshooting-unstable-cluster-shardlockobtainfailedexception).

#### Diagnosing other network disconnections

See [Diagnosing other network disconnections](/elastic/docs-builder/docs/3028/troubleshoot/elasticsearch/troubleshooting-unstable-cluster#troubleshooting-unstable-cluster-network).