Kibana task management
Elastic Stack
Kibana Task Manager is used by features such as Alerting, Actions, and Reporting to run mission critical work as persistent background tasks. These background tasks distribute work across multiple Kibana instances. This has three major benefits:
- Persistence: All task state and scheduling is stored in Elasticsearch, so if you restart Kibana, tasks will pick up where they left off.
- Scaling: Multiple Kibana instances can read from and update the same task queue in Elasticsearch, allowing the work load to be distributed across instances. If a Kibana instance no longer has capacity to run tasks, you can increase capacity by adding additional Kibana instances. For more information on scaling, see Kibana task manager scaling considerations.
- Load Balancing: Task Manager is equipped with a reactive self-healing mechanism, which allows it to reduce the amount of work it executes in reaction to an increased load related error rate in Elasticsearch. Additionally, when Task Manager experiences an increase in recurring tasks, it attempts to space out the work to better balance the load.
Task definitions for alerts and actions are stored in the index called .kibana_task_manager
.
You must have at least one replica of this index for production deployments.
If you lose this index, all scheduled alerts and actions are lost.
Kibana background tasks are managed as follows:
- An Elasticsearch task index is polled for overdue tasks at 3-second intervals. You can change this interval using the
xpack.task_manager.poll_interval
setting. - Tasks are claimed by updating them in the Elasticsearch index, using optimistic concurrency control to prevent conflicts. Each Kibana instance can run a maximum of 10 concurrent tasks, so a maximum of 10 tasks are claimed each interval.
- Elasticsearch and Kibana instances use the system clock to determine the current time. To ensure schedules are triggered when expected, synchronize the clocks of all nodes in the cluster using a time service such as Network Time Protocol.
- Tasks are run on the Kibana server.
- Task Manager ensures that tasks:
- Are only executed once
- Are retried when they fail (if configured to do so)
- Are rescheduled to run again at a future point in time (if configured to do so)
It is possible for tasks to run late or at an inconsistent schedule.
This is usually a symptom of the specific usage or scaling strategy of the cluster in question.
To address these issues, tweak the Kibana Task Manager settings or the cluster scaling strategy to better suit the unique use case.
For details on the settings that can influence the performance and throughput of Task Manager, see Task Manager Settings.
For detailed troubleshooting guidance, see Troubleshooting.