﻿---
title: Fix repeated snapshot policy failures
description: Repeated snapshot failures are usually an indicator of a problem with your deployment. Continuous failures of automated snapshots can leave a deployment...
url: https://www.elastic.co/elastic/docs-builder/docs/3028/troubleshoot/elasticsearch/repeated-snapshot-failures
products:
  - Elasticsearch
applies_to:
  - Elastic Stack: Generally available
---

# Fix repeated snapshot policy failures
Repeated snapshot failures are usually an indicator of a problem with your deployment. Continuous failures of automated snapshots can leave a deployment without recovery options in cases of data loss or outages.
<admonition title="Simplify monitoring with AutoOps">
  AutoOps is a [monitoring](https://www.elastic.co/elastic/docs-builder/docs/3028/deploy-manage/monitor) tool that simplifies cluster management through performance recommendations, resource utilization visibility, and real-time issue detection with resolution paths. Learn more about [AutoOps](https://www.elastic.co/elastic/docs-builder/docs/3028/deploy-manage/monitor/autoops).
</admonition>

Elasticsearch keeps track of the number of repeated failures when executing automated snapshots with [snapshot lifecycle management (SLM)](/elastic/docs-builder/docs/3028/deploy-manage/tools/snapshot-and-restore/create-snapshots#automate-snapshots-slm) policies. If an automated snapshot fails too many times without a successful execution, the health API reports a warning. The number of repeated failures before reporting a warning is controlled by the [`slm.health.failed_snapshot_warn_threshold`](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3028/reference/elasticsearch/configuration-reference/snapshot-restore-settings#slm-health-failed-snapshot-warn-threshold) setting.

## Review snapshot policy failures

If an automated SLM policy execution is experiencing repeated failures, follow these steps to get more information about the problem:
<tab-set>
  <tab-item title="Using Kibana">
    In Kibana, you can view all configured SLM policies and review their status and execution history. If the UI does not provide sufficient details about the failure, use the Console to retrieve the [snapshot policy information](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle) with the Elasticsearch API.
    1. Go to **Snapshot and Restore > Policies** to see the list of configured policies. You can find the **Snapshot and Restore** management page using the navigation menu or the [global search field](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/find-and-organize/find-apps-and-objects).
       ![](https://www.elastic.co/elastic/docs-builder/docs/3028/troubleshoot/images/elasticsearch-reference-slm-policies.png)

    1. The policies table lists all configured policies. Click on any of the policies to review the details and execution history.
    2. To get more detailed information about the failure, open Kibana **Dev Tools > Console**. You can find the **Console** using the navigation menu or the [global search field](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/find-and-organize/find-apps-and-objects).
       Once the Console is open, execute the steps described in the **Using the Elasticsearch API** tab to retrieve the affected SLM policy information.
  </tab-item>

  <tab-item title="Using the Elasticsearch API">
    The following step can be run using either [Kibana console](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/query-filter/tools/console) or direct [Elasticsearch API](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3028/reference/elasticsearch/rest-apis) calls.[Retrieve the affected SLM policy](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle):
    ```json
    ```
    The response looks like this:
    ```json
    {
      "affected-policy-name": { 
        "version": 1,
        "modified_date": "2099-05-06T01:30:00.000Z",
        "modified_date_millis": 4081757400000,
        "policy" : {
          "schedule": "0 30 1 * * ?",
          "name": "<daily-snap-{now/d}>",
          "repository": "my_repository",
          "config": {
            "indices": ["data-*", "important"],
            "ignore_unavailable": false,
            "include_global_state": false
          },
          "retention": {
            "expire_after": "30d",
            "min_count": 5,
            "max_count": 50
          }
        },
        "last_success" : {
          "snapshot_name" : "daily-snap-2099.05.30-tme_ivjqswgkpryvnao2lg",
          "start_time" : 4083782400000,
          "time" : 4083782400000
        },
        "last_failure" : { 
          "snapshot_name" : "daily-snap-2099.06.16-ywe-kgh5rfqfrpnchvsujq",
          "time" : 4085251200000, 
          "details" : """{"type":"snapshot_exception","reason":"[daily-snap-2099.06.16-ywe-kgh5rfqfrpnchvsujq] failed to create snapshot successfully, 5 out of 149 total shards failed"}""" 
        },
        "stats": {
          "policy": "daily-snapshots",
          "snapshots_taken": 0,
          "snapshots_failed": 0,
          "snapshots_deleted": 0,
          "snapshot_deletion_failures": 0
        },
        "next_execution": "2099-06-17T01:30:00.000Z",
        "next_execution_millis": 4085343000000
      }
    }
    ```
  </tab-item>
</tab-set>


## Possible causes

Snapshots can fail for a variety of reasons. If the failures are due to configuration errors, consult the documentation for the repository type that the snapshot policy is using. Refer to the [guide on managing repositories in ECE](https://www.elastic.co/elastic/docs-builder/docs/3028/deploy-manage/tools/snapshot-and-restore/cloud-enterprise) if you are using an Elastic Cloud Enterprise deployment.
One common failure scenario is repository corruption. This occurs most often when multiple instances of Elasticsearch write to the same repository location. There is a [separate troubleshooting guide](https://www.elastic.co/elastic/docs-builder/docs/3028/troubleshoot/elasticsearch/diagnosing-corrupted-repositories) to fix this problem.
If snapshots are failing for other reasons check the logs on the elected master node during the snapshot execution period for more information.