﻿---
title: Fix index lifecycle management errors
description: Index lifecycle management (ILM) runs actions asynchronously on your cluster's indices, according to the conditions you define in your policy. ILM phases...
url: https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/troubleshoot/elasticsearch/index-lifecycle-management-errors
products:
  - Elasticsearch
applies_to:
  - Elastic Stack: Generally available
---

# Fix index lifecycle management errors
[Index lifecycle management](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management) (ILM) runs actions asynchronously on your cluster's indices, according to the conditions you define in your policy. ILM [phases and actions](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management/index-lifecycle) run sequentially on each index, using the permissions of the user who last edited the [ILM policy](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management/configure-lifecycle-policy).
ILM can surface two types of issues:
- **Direct errors:** The Elasticsearch API call itself fails.
- **Indirect configuration issues:** The API call succeeds, but the intended result doesn't take effect.

This guide explains how to check overall ILM health, investigate individual indices, and resolve common errors.

## Check for ILM issues

This section covers the symptoms of stuck tasks and erring tasks, then shows  common investigative API commands.

### ILM transient steps

ILM purposely holds an index on a couple of steps for its logic-based and time-based conditions. The following [ILM explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-explain-lifecycle) `phase/action/step` combinations wait:
- `hot/rollover/check-rollover-ready` until [ILM rollover requirements](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/index-lifecycle-actions/ilm-rollover#ilm-rollover-options) are met.
- `*/complete/complete` until the index's `age` qualifies for the next phase's `min_age`. Refer to [how `min_age` is calculated](#min-age-calculation) for more information.

This page refers to steps other than these as _transient_ steps, where ILM asynchronously applies an operation against the index instead of waiting for a logic-based or time-based condition.
ILM [polls for work on an interval basis](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/configuration-reference/index-lifecycle-management-settings) (default `10m`). For more information, refer to [ILM phase transitions](/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management/index-lifecycle#ilm-phase-execution).
An index moves through its ILM steps as fast as the underlying operation finishes, plus the wait for the next poll. Transient steps that depend on an asynchronous operation can therefore be affected by [task backlogs](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/troubleshoot/elasticsearch/task-queue-backlog). Common examples:
- `*/migrate/check-migration` monitors the index's [shards' allocation and recoveries](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery).
- `*/*/forcemerge` waits for the index's [force merge](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/index-lifecycle-actions/ilm-forcemerge), noting the guide's performance considerations.
- `delete/wait_for_snapshot/wait-for-snapshot` delays until the [ILM Delete](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/index-lifecycle-actions/ilm-delete)'s [Snapshot lifecycle management (SLM) policy](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/deploy-manage/tools/snapshot-and-restore/create-snapshots) is successfully completed for the index.

It's fine if these transient steps appear in the [ILM explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-explain-lifecycle) output. But if an index doesn't progress past a step for an extended period, investigate. The cause is often specific to your setup or use case, rather than a cluster problem.

### ILM erring steps

When errors occur, the [ILM explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-explain-lifecycle) response includes the following:
- `failed_step`, set to the active step name
- `step`, set to `ERROR`
- `is_auto_retryable_error` flag, set
- `failed_step_retry_count`, incremented

All erring indices automatically run the [Retry policy API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-retry) on each ILM polling interval. During automatic or manual retry:
- The `step` resets to the active step description and `failed_step` is removed.
- The `is_auto_retryable_error` persists.
- The `failed_step_retry_count` persists and increments again if another error is encountered.

Non-erring indices do not report the fields `failed_step`, `is_auto_retryable_error`, nor `failed_step_retry_count`. Indices that have recovered from previous errors also remove these temporary fields. This is why the [ILM explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-explain-lifecycle) supports the `only_errors` flag, which returns only indices that are currently failing or are retrying a step:
```json
```

For troubleshooting, [ILM explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-explain-lifecycle) emits `step_info`. This field is returned only when further context is available, such as `message` for information and `reason` for errors.
If ILM cannot automatically resolve the error for this index, execution is halted until the underlying issue with the policy, index, or cluster is resolved. For example, shard migrations might block until [Elastic Cloud Autoscaling](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/deploy-manage/autoscaling) scales or adds necessary [data tiers](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/manage-data/lifecycle/data-tiers).

### ILM health

Use the following APIs to check ILM health across all indices.
Elasticsearch's [Cluster health API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-health-report) reports `stagnating_indices` for indices that have been attempting a step longer than expected:
```json
```

This report's thresholds are controlled by [Read cluster settings API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-get-settings):
- `health.ilm.max_retries_per_step` (default `100`)
- `health.ilm.max_time_on_action` (default `1d`)
- `health.ilm.max_time_on_step` (default `1d`)

This report consolidates actionable interventions to consider for your ILM and cluster health.
For a high-level summary of all index statuses (not just those needing intervention), use the [ILM explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-explain-lifecycle). To save an overview of all `phase/action/step` index statuses to `ilm_explain.json`, processed with [jq](https://jqlang.github.io/jq/):
```bash
$ cat ilm_explain.json | jq -c '.indices[]|select(.managed==true)|.phase+"/"+.action+"/"+.step' | sort | uniq -c | sort -r
```

<tip>
  For example ILM troubleshooting walkthroughs, refer to
  - [Monitoring ILM Elasticsearch Health](https://www.youtube.com/watch?v=VCIqkji3IwY) for resolving erring steps.
  - [ILM History Index](https://www.youtube.com/watch?v=onrnnwjYWSQ) for an explanation of step sequences and how to review historical index statuses.
</tip>


### Troubleshooting ILM for an index

The following example demonstrates troubleshooting ILM for a newly created index. Consider a `shrink-index` policy that [shrinks](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/index-lifecycle-actions/ilm-shrink) an index to four shards once it is at least five days old:
```json

{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "5d",
        "actions": {
          "shrink": {
            "number_of_shards": 4
          }
        }
      }
    }
  }
}
```

To [create an index](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-create) `my-index-000001` that has only two primary shards and [apply the ILM policy](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management/policy-apply) `shrink-index`:
```json

{
  "settings": {
    "index.number_of_shards": 2,
    "index.lifecycle.name": "shrink-index"
  }
}
```

After five days, ILM attempts to run the [shrink index API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-shrink) against index `my-index-000001` from two shards to four shards. Because the shrink action cannot *increase* the number of shards, this operation fails and ILM moves `my-index-000001` to the `step` of `ERROR`.
Use the [ILM explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-explain-lifecycle) to get information about what went wrong:
```json
```

Which returns the following information:
```json
{
  "indices" : {
    "my-index-000001" : {
      "index" : "my-index-000001",
      "managed" : true,
      "index_creation_date_millis" : 1541717265865,
      "time_since_index_creation": "5.1d",
      "policy" : "shrink-index",                
      "lifecycle_date_millis" : 1541717265865,
      "age": "5.1d",                            
      "phase" : "warm",                         
      "phase_time_millis" : 1541717272601,
      "action" : "shrink",                      
      "action_time_millis" : 1541717272601,
      "step" : "ERROR",                         
      "step_time_millis" : 1541717272688,
      "failed_step" : "shrink",                 
      "step_info" : {
        "type" : "illegal_argument_exception",  
        "reason" : "the number of target shards [4] must be less that the number of source shards [2]"
      },
      "phase_execution" : {
        "policy" : "shrink-index",
        "phase_definition" : {                  
          "min_age" : "5d",
          "actions" : {
            "shrink" : {
              "number_of_shards" : 4
            }
          }
        },
        "version" : 1,
        "modified_date_in_millis" : 1541717264230
      }
    }
  }
}
```

To resolve this, update the policy to shrink the index to a single shard after 5 days:
```json

{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "5d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          }
        }
      }
    }
  }
}
```

After resolving the underlying problem, wait for ILM's poll interval to automatically retry the index's `ERROR` step, or apply the [retry policy API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-retry) to run it on demand:
```json
```

ILM subsequently attempts to re-run the step that failed. You can use the [ILM Explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-explain-lifecycle) to monitor the progress.

## Key ILM concepts

The following behaviors come up often when troubleshooting ILM. For more details, refer to the [ILM guide](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management) or [contact us](/elastic/docs-content/pull/6340/troubleshoot#contact-us).

### How `min_age` is calculated

When setting up an [ILM policy](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management/configure-lifecycle-policy) or [automating rollover with ILM](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management/rollover), be aware that `min_age` can be relative to either the rollover time or the index creation time.
If you use [ILM rollover](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/index-lifecycle-actions/ilm-rollover), `min_age` is calculated relative to the time the index was rolled over. This is because the [rollover API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-rollover) generates a new index and updates the `age` of the previous index to reflect the rollover time. If the index hasn’t been rolled over, then the `age` is the same as the `creation_date` for the index.
You can override how `min_age` is calculated using the `index.lifecycle.origination_date` and `index.lifecycle.parse_origination_date` [ILM settings](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/configuration-reference/index-lifecycle-management-settings).

### Steps proceed sequentially

ILM does not skip steps due to logic-based or time-based conditions. It proceeds through all steps [in the enabled action's order](/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management/index-lifecycle#phases-availability). For example, this means it's possible for an index stagnated at `phase/action/step` of `warm/migrate/check-migration` to surpass its expected deletion time. Make sure to review and resolve ILM errors to maintain a healthy cluster. For more information, refer to [ILM phase transitions](/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management/index-lifecycle#ilm-phase-transitions).

### Policy changes apply forward

When an index enters a phase, it caches the ILM policy's current definition. For more information, refer to [phase execution](/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management/index-lifecycle#ilm-phase-execution). This enables ILM to protect the index from policy changes which might cause data corruption.
As described in [how changes are applied](/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management/policy-updates#ilm-apply-changes), ILM applies safe updates to an index's `phase_execution` immediately. Updates that aren't safe to apply retroactively are forward-applied, taking effect only as indices enter the phase after the update.
You might need to apply a policy change to indices that are already stagnant. It's not possible to run a single ILM step on demand, because doing so might corrupt the index. Instead, apply the relevant changes to those indices manually.
In rare cases, a policy change can leave indices stagnant. The only fix is the [move to an ILM step API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-move-to-step). This is an advanced API -- [contact us](/elastic/docs-content/pull/6340/troubleshoot#contact-us) with an [Elasticsearch diagnostic](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/troubleshoot/elasticsearch/diagnostic) before using it.

## Common ILM errors

Each entry below shows the message you'll see in the `ERROR` step, the cause, and the recommended fix. Errors are grouped by the ILM action where they typically occur.

### Rollover errors

<tip>
  Problems with rollover aliases are a common cause of errors. You should consider using [data streams](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/manage-data/data-store/data-streams) instead of managing rollover with aliases.
</tip>

These errors can occur when the [ILM rollover](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/index-lifecycle-actions/ilm-rollover) action runs:
<dropdown title="Rollover alias [x] can point to multiple indices, found duplicated alias [x] in index template [z]">
  The target rollover alias is specified in an index template’s `index.lifecycle.rollover_alias` setting. You need to explicitly configure this alias *one time* when you [bootstrap the initial index](/elastic/docs-content/pull/6340/manage-data/lifecycle/index-lifecycle-management/tutorial-time-series-without-data-streams#ilm-gs-alias-bootstrap). The rollover action then manages setting and updating the alias to [roll over](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-rollover#rollover-index-api-desc) to each subsequent index.Do not explicitly configure this same alias in the aliases section of an index template.For an example, refer to this [resolving duplicate alias video](https://www.youtube.com/watch?v=Ww5POq4zZtY).
</dropdown>

<dropdown title="index.lifecycle.rollover_alias [x] does not point to index [y]">
  Either the index is using the wrong alias or the alias does not exist.Check the `index.lifecycle.rollover_alias` [index setting](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-get-settings). To see what aliases are configured, use [_cat/aliases](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-aliases).For an example, refer to this [resolving alias not pointing to index video](https://www.youtube.com/watch?v=NKSe67x7aw8).
</dropdown>

<dropdown title="setting [index.lifecycle.rollover_alias] for index [y] is empty or not defined">
  The `index.lifecycle.rollover_alias` setting must be configured for the rollover action to work.Update the index settings to set `index.lifecycle.rollover_alias`.For an example, refer to this [resolving empty or not defined video](https://www.youtube.com/watch?v=LRpMC2GS_FQ).
</dropdown>

<dropdown title="alias [x] has more than one write index [y,z]">
  Only one index can be designated as the write index for a particular alias.Use the [aliases](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-update-aliases) API to set `is_write_index:false` for all but one index.For an example, refer to this [resolving more than one write index video](https://www.youtube.com/watch?v=jCUvZCT5Hm4).
</dropdown>

<dropdown title="index name [x] does not match pattern ^.*-\d+">
  The index name must match the regex pattern `^.*-\d+` for the rollover action to work. The most common problem is that the index name does not contain trailing digits. For example, `my-index` does not match the pattern requirement.Append a numeric value to the index name, for example `my-index-000001`.For an example, refer to this [resolving does not match pattern video](https://www.youtube.com/watch?v=9sp1zF6iL00).
</dropdown>


### ILM migrate errors

The following errors usually surface during shard recovery, which can occur when you use [ILM migrate](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/index-lifecycle-actions/ilm-migrate) operations or [ILM searchable snapshots](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/index-lifecycle-actions/ilm-searchable-snapshot). Because these operations run asynchronously, the error reported by ILM often shows only a symptom of the real problem. To troubleshoot the underlying cause, refer to [cluster allocation API examples](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/troubleshoot/elasticsearch/cluster-allocation-api-examples).
<dropdown title="index has a preference for tiers [xxx] and node does not meet the required [xxx] tier">
  If the [allocation explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-allocation-explain) returns this error, it indicates that shards cannot be assigned according to the current attribute-based or data tier allocation rules. For detailed guidance on resolving this issue, refer to [Unable to assign shards based on the allocation rule](https://www.elastic.co/docs/troubleshoot/monitoring/unavailable-shards#ec-cannot-assign-shards-on-allocation-rule).
</dropdown>


### General ILM errors

The following errors can surface on any ILM step.
<dropdown title="CircuitBreakingException: [x] data too large, data for [y]">
  This indicates that the cluster is hitting resource limits.Before continuing to set up ILM, you’ll need to take steps to alleviate the resource issues. For more information, see [Circuit breaker errors](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/troubleshoot/elasticsearch/circuit-breaker-errors).
</dropdown>

<dropdown title="high disk watermark [x] exceeded on [y]">
  This indicates that the cluster is running out of disk space. This can happen when you don’t have index lifecycle management set up to roll over from hot to warm nodes. For more information, see [Watermark errors](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/troubleshoot/elasticsearch/fix-watermark-errors).
</dropdown>

<dropdown title="security_exception: action [<action-name>] is unauthorized for user [<user-name>] with roles [<role-name>], this action is granted by the index privileges [manage_follow_index,manage,all]">
  ILM runs each action as the user who last modified the policy, with the privileges they held at that time. This error means the action requires privileges that user doesn't have.To fix it, make sure the account that creates or modifies the policy has the necessary permission for every operation it includes. If this error surfaces on system indices, refer to [File-based access recovery](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6340/troubleshoot/elasticsearch/file-based-recovery).
</dropdown>

<dropdown title="policy [<policy-name>] does not exist">
  The error occurs because the index is assigned to an ILM policy that does not exist in the cluster. To fix this, you can either [create the missing policy](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-put-lifecycle) with the required settings or [link the index to an existing ILM policy](https://docs-v3-preview.elastic.dev/elastic/elasticsearch/tree/main/reference/elasticsearch/configuration-reference/index-lifecycle-management-settings#index-lifecycle-name).
</dropdown>