Loading

Airflow Integration

<div class="condensed-table">
| | |
| --- | --- |
| Version | 0.9.1 [beta] (View all) |
| Compatible Kibana version(s) | 8.13.0 or higher |
| Supported Serverless project types
What’s this? | Security
Observability |
| Subscription level
What’s this? | Basic |
| Level of support
What’s this? | Elastic |

</div>

Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring workflows. It allows users to define workflows as Directed Acyclic Graphs (DAGs) of tasks, which are then executed by the Airflow scheduler on an array of workers while following the specified dependencies.

Use the Airflow integration to:

  • Collect detailed metrics from Airflow using StatsD to gain insights into system performance.
  • Create informative visualizations to track usage trends, measure key metrics, and derive actionable business insights.
  • Monitor your workflows' performance and status in real-time.

The Airflow integration gathers metric data.

Metrics provide insight into the statistics of Airflow. The Metric data stream collected by the Airflow integration is statsd, enabling users to monitor and troubleshoot the performance of the Airflow instance.

Data stream:

  • statsd: Collects metrics related to scheduler activities, pool usage, task execution details, executor performance, and worker states in Airflow.

Note:

  • Users can monitor and view metrics within the ingested documents for Airflow in the metrics-* index pattern from Discover.

The Airflow module is tested with Airflow 2.4.0. It should work with versions 2.0.0 and later.

Users require Elasticsearch to store and search user data, and Kibana to visualize and manage it. They can utilize the hosted Elasticsearch Service on Elastic Cloud, which is recommended, or self-manage the Elastic Stack on their own hardware.

To ingest data from Airflow, users must have StatsD to receive the same.

For step-by-step instructions on how to set up an integration, see the Getting started guide.

Be sure to follow the official Airflow Installation Guide for the correct installation of Airflow.

Include the following lines in the user’s Airflow configuration file (e.g. airflow.cfg). Leave statsd_prefix empty and replace %HOST% with the address where the Agent is running:

[metrics]
statsd_on = True
statsd_host = %HOST%
statsd_port = 8125
statsd_prefix =

Once the integration is set up, you can click on the Assets tab in the Airflow integration to see a list of available dashboards. Choose the dashboard that corresponds to your configured data stream. The dashboard should be populated with the required data.

  • Check if the StatsD server is receiving data from Airflow by examining the logs for potential errors.
  • Make sure the %HOST% placeholder in the Airflow configuration file is replaced with the correct address of the machine where the StatsD server is running.
  • If Airflow metrics are not being emitted, confirm that the [metrics] section in the airflow.cfg file is properly configured as per the instructions above.

This is the statsd data stream, which collects metrics related to scheduler activities, pool usage, task execution details, executor performance, and worker states in Airflow.

ECS Field Reference

Please refer to the following document for detailed information on ECS fields.