Loading

Watermark errors

When a data node reaches critical disk space usage, its disk-based shard allocation watermark settings trigger to protect the node's disk. The default watermark percentage thresholds, the summary of Elasticsearch's response, and their corresponding Elasticsearch log are:

  • 85% low: Elasticsearch stops allocating replica shards and primary shards unless from newly-created indices to the affected node(s).
    low disk watermark [85%] exceeded on [NODE_ID][NODE_NAME] free: Xgb[X%], replicas will not be assigned to this node
    		
  • 90% high: Elasticsearch rebalances shards away from the affected node(s).
    high disk watermark [90%] exceeded on [NODE_ID][NODE_NAME] free: Xgb[X%], shards will be relocated away from this node
    		
  • 95% flood-stage: Elasticsearch sets all indices on the affected node(s) to read-only. The write block is automatically removed once disk usage on the affected node falls below the high watermark.
    flood-stage watermark [95%] exceeded on [NODE_ID][NODE_NAME], all indices on this node will be marked read-only
    		
Note

At 75% disk usage, the Elastic Cloud Console displays a red disk indicator for the node to signal elevated usage. This threshold is a visual indicator only and is not tied to any Elasticsearch watermark or disk-enforcement behavior. No Elasticsearch allocation or write restrictions are applied at this stage.

To prevent a full disk, when a node reaches flood-stage watermark, Elasticsearch blocks writes to any index with a shard on the affected node(s). If the block affects related system indices, Kibana and other Elastic Stack features can become unavailable. For example, flood-stage can induce errors like:

  • Kibana's Kibana Server is not Ready yet error message.
  • Elasticsearch's ingest API's reject the request with HTTP 429 error bodies like:
    {
      "reason": "index [INDEX_NAME] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];",
      "type": "cluster_block_exception"
    }
    		

The following are some common setup issues leading to watermark errors:

  • Sudden ingestion of large volumes of data that consumes disk above peak load testing expectations. Refer to Indexing performance considerations for guidance.
  • Inefficient index settings, unnecessary stored fields, and suboptimal document structures can increase disk consumption. Refer to Tune for disk usage for guidance.
  • A high number of replicas can quickly multiply storage requirements, as each replica consumes the same disk space as the primary shard. Refer to Index settings for details.
  • Oversized shards can make disk usage spikes more likely and slow down recovery and rebalancing. Refer to Size your shards for guidance.
Simplify monitoring with AutoOps

AutoOps is a monitoring tool that simplifies cluster management through performance recommendations, resource utilization visibility, and real-time issue detection with resolution paths. Learn more about AutoOps.

To track disk usage over time, enable monitoring using one of the following options, depending on your deployment type:

To verify that shards are moving off the affected node until it falls below high watermark, use the following Elasticsearch APIs:

  • Cluster health status API to check relocating_shards.

    				GET _cluster/health
    		
  • CAT recovery API to check the count of recovering shards and their migrated bp bytes percent of tb total bytes.

    				GET _cat/recovery?v=true&expand_wildcards=all&active_only=true&h=time,tb,bp,top,ty,st,snode,tnode,idx,sh&s=time:desc
    		

If shards remain on the node keeping it above high watermark, use the following Elasticsearch APIs:

You should normally wait for Elasticsearch to balance itself. If advanced users determine shards which should migrate off node faster, whether due to forecasted ingestion rate or existing disk usage, they might consider using the Reroute the cluster API to push their chosen shard to immediately rebalance to their determined target node.

To immediately restore write operations, you can temporarily increase disk watermarks and remove the write block.

				PUT _cluster/settings
					{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.low.max_headroom": "100GB",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.high.max_headroom": "20GB",
    "cluster.routing.allocation.disk.watermark.flood_stage": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": "5GB",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": "5GB"
  }
}
				PUT */_settings?expand_wildcards=all
					{
  "index.blocks.read_only_allow_delete": null
}
		

When a long-term solution is in place, to reset or reconfigure the disk watermarks:

				PUT _cluster/settings
					{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": null,
    "cluster.routing.allocation.disk.watermark.low.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.high": null,
    "cluster.routing.allocation.disk.watermark.high.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": null
  }
}
		
Note

Elasticsearch recommends using default watermark settings. Advanced users can override the watermark thresholds and headroom but risk not giving enough disk for background processes such as force merge, not being right-sized to data ingestion rates vs index lifecycle management settings, and possibly disk is full errors if 100% disk is reached.

To resolve watermark errors permanently, perform one of the following actions:

  • Horizontally scale nodes of the affected data tiers.
  • Vertically scale existing nodes to increase disk space. Ensure nodes within a data tier are scaled to matching hardware profiles to avoid hot spotting.
  • Delete indices using the delete index API, either permanently if the index isn’t needed, or temporarily to later restore from snapshot.
Tip

On Elastic Cloud Hosted and Elastic Cloud Enterprise, you might need to temporarily delete indices using the Elasticsearch API Console. This can resolve a status: red cluster health status, which blocks deployment changes. After resolving the issue, you can restore the indices from a snapshot. If you experience issues with this resolution flow, reach out to Elastic Support for assistance.

To reduce the likelihood of watermark errors: