Watermark errors
When a data node is critically low on disk space and has reached the flood-stage disk usage watermark, the following error is logged: Error: disk usage exceeded flood-stage watermark, index has read-only-allow-delete block.
To prevent a full disk, when a node reaches this watermark, Elasticsearch blocks writes to any index with a shard on the node. If the block affects related system indices, Kibana and other Elastic Stack features may become unavailable. For example, this could induce Kibana's Kibana Server is not Ready yet error message.
Elasticsearch will automatically remove the write block when the affected node’s disk usage falls below the high disk watermark. To achieve this, Elasticsearch attempts to rebalance some of the affected node’s shards to other nodes in the same data tier.
AutoOps is a monitoring tool that simplifies cluster management through performance recommendations, resource utilization visibility, and real-time issue detection with resolution paths. Learn more about AutoOps.
Elasticsearch uses disk-based shard allocation watermarks to prevent disk overuse and protect against data loss. Until a node reaches the flood-stage watermark, indexing is not blocked and shards can continue to grow on disk. Default watermark thresholds and their effects:
- 75% (
none) – In the Cloud UI (ECE and ECH), the disk bar appears red. Elasticsearch takes no action. - 85% (
low) – Stops allocating new primary or replica shards to the affected node(s). - 90% (
high) – Moves shards away from the affected node(s). - 95% (
flood-stage) – Sets all indices on the affected node(s) to read-only. This is automatically reverted once the node’s usage drops below the high watermark. Indexing on affected nodes stops.
To verify that shards are moving off the affected node until it falls below high watermark, use the cat shards API and cat recovery API:
GET _cat/shards?v=true
GET _cat/recovery?v=true&active_only=true
If shards remain on the node keeping it about high watermark, use the cluster allocation explanation API to get an explanation for their allocation status.
GET _cluster/allocation/explain
{
"index": "my-index",
"shard": 0,
"primary": false
}
Watermark errors occur when a node’s disk usage exceeds the configured thresholds (low, high, or flood-stage). While these thresholds protect cluster stability, they can be triggered by several underlying factors including:
- Sudden ingestion of large volumes of data, often referred to as large indexing bursts, can quickly consume disk space, especially if the cluster is not sized for peak loads. Refer to Indexing performance considerations for guidance.
- Inefficient index settings, unnecessary stored fields, and suboptimal document structures can increase disk consumption. See Tune for disk usage for guidance on reducing storage requirements.
- A high number of replicas can quickly multiply storage requirements, as each replica consumes the same disk space as the primary shard. Refer to Index settings for details.
- Oversized shards can make disk usage spikes more likely and slow down recovery and rebalancing. Learn more in Size your shards.
To immediately restore write operations, you can temporarily increase disk watermarks and remove the write block.
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": "90%",
"cluster.routing.allocation.disk.watermark.low.max_headroom": "100GB",
"cluster.routing.allocation.disk.watermark.high": "95%",
"cluster.routing.allocation.disk.watermark.high.max_headroom": "20GB",
"cluster.routing.allocation.disk.watermark.flood_stage": "97%",
"cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": "5GB",
"cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
"cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": "5GB"
}
}
PUT */_settings?expand_wildcards=all
{
"index.blocks.read_only_allow_delete": null
}
When a long-term solution is in place, to reset or reconfigure the disk watermarks:
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": null,
"cluster.routing.allocation.disk.watermark.low.max_headroom": null,
"cluster.routing.allocation.disk.watermark.high": null,
"cluster.routing.allocation.disk.watermark.high.max_headroom": null,
"cluster.routing.allocation.disk.watermark.flood_stage": null,
"cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": null,
"cluster.routing.allocation.disk.watermark.flood_stage.frozen": null,
"cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": null
}
}
To resolve watermark errors permanently, perform one of the following actions:
- Horizontally scale nodes of the affected data tiers.
- Vertically scale existing nodes to increase disk space.
- Delete indices using the delete index API, either permanently if the index isn’t needed, or temporarily to later restore.
- update related ILM policy to push indices through to later data tiers
To reduce the likelihood of watermark errors:
- Implement more restrictive ILM policies to delete or move data sooner, helping keep disk usage under control. Refer to Index lifecycle management.
- Enable Autoscaling to automatically adjust resources based on storage and performance needs.
- Configure Stack monitoring and enable disk usage monitoring alerts to track disk usage trends and identify increases before watermark thresholds are exceeded.
- Optimize shard sizes to balance disk usage (and performance), avoiding a mix of overly large and small shards. Refer to Size your shards.
On Elastic Cloud Hosted and Elastic Cloud Enterprise, indices may need to be temporarily deleted using the its Elasticsearch API Console to later snapshot restore to resolve cluster health status:red which blocks attempted changes. If you experience issues with this resolution flow, reach out to Elastic Support for assistance.