Handling delayed data

Delayed data are documents that are indexed late. That is to say, it is data related to a time that your datafeed has already processed and it is therefore never analyzed by your anomaly detection job.

When you create a datafeed, you can specify a query_delay setting. This setting enables the datafeed to wait for some time past real-time, which means any "late" data in this period is fully indexed before the datafeed tries to gather it. However, if the setting is set too low, the datafeed may query for data before it has been indexed and consequently miss that document. Conversely, if it is set too high, analysis drifts farther away from real-time. The balance that is struck depends upon each use case and the environmental factors of the cluster.

Important

If you get an error that says Datafeed missed XXXX documents due to ingest latency, consider increasing the value of query_delay. If it doesn’t help, investigate the ingest latency and its cause. You can do this by comparing event and ingest timestamps. High latency is often caused by bursts of ingested documents, misconfiguration of the ingest pipeline, or misalignment of system clocks.