﻿---
title: Tutorial: Getting started with anomaly detection
description: Ready to take anomaly detection for a test drive? Follow this tutorial to: Try out the Data Visualizer, Create anomaly detection jobs for the Kibana sample...
url: https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-getting-started
products:
  - Elasticsearch
  - Machine Learning
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# Tutorial: Getting started with anomaly detection
Ready to take anomaly detection for a test drive? Follow this tutorial to:
- Try out the **Data Visualizer**
- Create anomaly detection jobs for the Kibana sample data
- Use the results to identify possible anomalies in the data

At the end of this tutorial, you should have a good idea of what machine learning is and will hopefully be inspired to use it to detect anomalies in your own data.
Need more context? Check out the [Elasticsearch introduction](https://www.elastic.co/elastic/docs-builder/docs/3028/get-started) to learn the lingo and understand the basics of how Elasticsearch works.

## Requirements

- Before you can play with the machine learning features, you must install Elasticsearch and Kibana. Elasticsearch stores the data and the analysis results. Kibana provides a helpful user interface for creating and viewing jobs.
  <tip>
  You can run Elasticsearch and Kibana on your own hardware or on [Elastic Cloud](https://www.elastic.co/cloud/). [Try out Elastic Cloud for free](https://cloud.elastic.co/registration?elektra=en-ess-sign-up-page).
  </tip>
- Verify that your environment is set up properly to use the machine learning features. If the Elasticsearch security features are enabled, to complete this tutorial you need a user that has authority to manage anomaly detection jobs. See [Setup and security](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/setting-up-machine-learning).
- [Add the sample data sets that ship with Kibana](/elastic/docs-builder/docs/3028/explore-analyze#gs-get-data-into-kibana). In this tutorial, you’ll use the **Sample web logs**.

These data sets are now ready be analyzed in machine learning jobs in Kibana.

## Explore the data in Kibana

To get the best results from machine learning analytics, you must understand your data. You must know its data types and the range and distribution of values. The Data Visualizer enables you to explore the fields in your data:
1. Open Kibana in your web browser. If you are running Kibana locally, go to `http://localhost:5601/`.
   <tip>
   The Kibana machine learning features use pop-ups. You must configure your web browser so that it does not block pop-up windows or create an exception for your Kibana URL.
   </tip>
2. Open **Machine Learning** from the main menu, or use the [global search field](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/find-and-organize/find-apps-and-objects).
3. Select the **Data Visualizer** tab.
4. Click **Select data view** and choose the `kibana_sample_data_logs` data view.
5. Use the time filter to select a time period that you’re interested in exploring. Alternatively, click **Use full data** to view the full time range of data.
6. Optional: You can change the random sampling behavior, which affects the number of documents per shard that are used in the Data Visualizer. You can use automatic random sampling that balances accuracy and speed, manual sampling where you can chose a value for the sampling percentage, or you can turn the feaure off to use the full data set. There is a relatively small number of documents in the Kibana sample data, so you can turn random sampling off. For larger data sets, keep in mind that using a large sample size increases query run times and increases the load on the cluster.
7. Explore the fields in the Data Visualizer.
   You can filter the list by field names or [field types](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3028/reference/elasticsearch/mapping-reference/field-data-types). The Data Visualizer indicates how many of the documents in the sample for the selected time period contain each field.
   In particular, look at the `clientip`, `response.keyword`, and `url.keyword` fields, since we’ll use them in our anomaly detection jobs. For these fields, the Data Visualizer provides the number of distinct values, a list of the top values, and the number and percentage of documents that contain the field. For example:
   ![Data Visualizer output for ip and keyword fields](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-data-keyword.jpg)
   For numeric fields, the Data Visualizer provides information about the minimum, median, maximum, and top values, the number of distinct values, and their distribution. You can use the distribution chart to get a better idea of how the values in the data are clustered. For example:
   ![Data Visualizer for sample web logs](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-data-metric.jpg)

<tip>
  Make note of the range of dates in the `@timestamp` field. They are relative to when you added the sample data and you’ll need that information later in the tutorial.
</tip>

Now that you’re familiar with the data in the `kibana_sample_data_logs` index, you can create some anomaly detection jobs to analyze it.
<tip>
  You can view the statistics of the selectable fields in the anomaly detection wizard. The field statistics displayed in a flyout provide more meaningful context to help you select relevant fields.
</tip>


## Create sample anomaly detection jobs in Kibana

<important>
  The results on this page might be different than the actual values you get when using the sample data sets. This behavior is expected as the data points in the data sets might change over time.
</important>

The Kibana sample data sets include some pre-configured anomaly detection jobs for you to play with. You can use either of the following methods to add the jobs:
- After you load the [sample web logs data set](/elastic/docs-builder/docs/3028/explore-analyze#gs-get-data-into-kibana), click **View data** > **ML jobs**.
- In the Machine Learning app, when you select the `kibana_sample_data_logs` data views in the **Data Visualizer** or the **Anomaly Detection** job wizards, it recommends that you create a job using its known configuration. Select the **Kibana sample data web logs** configuration.
- Accept the default values and click **Create Jobs**.

The wizard creates three jobs and three datafeeds.
<admonition title="Datafeeds, buckets, and detectors">
  A *datafeed* retrieves time series data from Elasticsearch and provides it to an anomaly detection job for analysis.The job uses *buckets* to divide the time series into batches for processing. For example, all three sample web log jobs use a bucket span of 1 hour.Each anomaly detection job contains one or more *detectors*, which define the type of analysis that occurs (for example, `max`, `average`, or `rare` analytical functions) and the fields that are analyzed. Some of the analytical functions look for single anomalous data points. For example, `max` identifies the maximum value that is seen within a bucket. Others perform some aggregation over the length of the bucket. For example, `mean` calculates the mean of all the data points seen within the bucket.For more information, see [Datafeeds](/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs#ml-ad-datafeeds), [Buckets](/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs#ml-ad-create-job), and [*Function reference*](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-functions).
</admonition>

If you want to see all of the configuration details for your jobs and datafeeds, you can do so on the **Stack Management** > **Anomaly Detection Jobs** page, which you can access from the navigation menu or using the [global search bar](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/find-and-organize/find-apps-and-objects). Alternatively, you can see the configuration files in [GitHub](https://github.com/elastic/kibana/tree/master/x-pack/platform/plugins/shared/ml/server/models/data_recognizer/modules/sample_data_weblogs). For the purposes of this tutorial, however, here’s a quick overview of the goal of each job:
- `low_request_rate` uses the `low_count` function to find unusually low request rates
- `response_code_rates` uses the `count` function and partitions the analysis by `response.keyword` values to find unusual event rates by HTTP response code
- `url_scanning` uses the `high_distinct_count` function and performs population analysis on the `clientip` field to find client IPs accessing an unusually high distinct count of URLs

The next step is to view the results and see what types of insights these jobs have generated!

## View anomaly detection results

After the datafeeds are started and the anomaly detection jobs have processed some data, you can view the results in Kibana.
<tip>
  Depending on the capacity of your machine, you might need to wait a few seconds for the machine learning analysis to generate initial results.
</tip>

![Create jobs for the sample web logs](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-web-results.png)

The machine learning features analyze the input stream of data, model its behavior, and perform analysis based on the detectors in each job. When an event occurs outside of the model, that event is identified as an anomaly. You can immediately see that all three jobs have found anomalies, which are indicated by red blocks in the swim lanes for each job. Swim lane colors map to the anomaly score: blue indicates low, yellow indicates medium, and red indicates high severity, with different shades representing varying levels within each category.
There are two tools for examining the results from anomaly detection jobs in Kibana: the **Anomaly Explorer** and the **Single Metric Viewer**. You can switch between these tools by clicking the icons in the top left corner. You can also edit the job selection to examine a different subset of anomaly detection jobs.

### Single metric job results

One of the sample jobs (`low_request_rate`), is a *single metric anomaly detection job*. It has a single detector that uses the `low_count` function and limited job properties. You might use a job like this if you want to determine when the request rate on your web site drops significantly.
Let’s start by looking at this simple job in the **Single Metric Viewer**:
1. Select the **Anomaly Detection Jobs** tab in **Machine Learning** to see the list of your anomaly detection jobs.
2. Click the chart icon in the **Actions** column for your `low_request_rate` job to view its results in the **Single Metric Viewer**.
3. Use the relative mode of the date picker to select a start date one week in the past and an end date one month in the future to cover the majority of the analyzed data points.

![Single Metric Viewer for low_request_rate job](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-job1-analysis.jpg)

This view contains a chart that represents the actual and expected values over time. It is available only if the job has `model_plot_config` enabled. It can display only a single time series.
The blue line in the chart represents the actual data values. The shaded blue area represents the bounds for the expected values. The area between the upper and lower bounds are the most likely values for the model, using a 95% confidence level. That is to say, there is a 95% chance of the actual value falling within these bounds. If a value is outside of this area then it will usually be identified as anomalous.
If you slide the time selector from the beginning to the end of the data, you can see how the model improves as it processes more data. At the beginning, the expected range of values is pretty broad and the model is not capturing the periodicity in the data. But it quickly learns and begins to reflect the patterns in your data.
<admonition title="Anomaly scores">
  Any data points outside the range that was predicted by the model are marked as anomalies. In order to provide a sensible view of the results, an *anomaly score* is calculated for each bucket time interval. The anomaly score is a value from 0 to 100, which indicates the significance of the anomaly compared to previously seen anomalies. The highly anomalous values are shown in red and the low scored values are indicated in blue. An interval with a high anomaly score is significant and requires investigation.
</admonition>

Slide the time selector to a section of the time series that contains a red anomaly data point. If you hover over the point, you can see more information.
<note>
  You might notice a high spike in the time series. It’s not highlighted as an anomaly, however, since this job looks for low counts only.
</note>

For each anomaly, you can see key details such as the time, the actual and expected ("typical") values, and their probability in the **Anomalies** section of the viewer. For example:
![Single Metric Viewer Anomalies for low_request_rate job](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-job1-anomalies.jpg)

In the **Actions** column, there are additional options, such as **Raw data** which generates a query for the relevant documents in **Discover**. You can optionally add more links in the actions menu with [custom URLs](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-configuring-url).
By default, the table contains all anomalies that have a severity of "warning" or higher in the selected section of the timeline. If you are only interested in critical anomalies, for example, you can change the severity threshold for this table.
The **Anomaly explanation** section gives you further insights about each anomaly, such as its type and impact, to make it easier to interpret the job results.
You can optionally annotate your job results by drag-selecting a period of time in the **Single Metric Viewer** and adding a description. Annotations are notes that refer to events in a specific time period. They can be created by the user or generated automatically by the anomaly detection job to reflect model changes and noteworthy occurrences.
![A user annotation in the Single Metric Viewer](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-user-annotation.jpg)

After you have identified anomalies, often the next step is to try to determine the context of those situations. For example, are there other factors that are contributing to the problem? Are the anomalies confined to particular applications or servers? You can begin to troubleshoot these situations by layering additional jobs or creating multi-metric jobs.

### Advanced or multi-metric job results

Conceptually, you can think of *multi-metric anomaly detection jobs* as running multiple independent single metric jobs. By bundling them together in a multi-metric job, however, you can see an overall score and shared influencers for all the metrics and all the entities in the job. Multi-metric jobs therefore scale better than having many independent single metric jobs. They also provide better results when you have influencers that are shared across the detectors.
<admonition title="Influencers">
  When you create an anomaly detection job, you can identify fields as *influencers*. These are fields that you think contain information about someone or something that influences or contributes to anomalies. There are influencers in both the `response_code_rates` and `url_scanning` jobs.As a best practice, do not pick too many influencers. For example, you generally do not need more than three. If you pick many influencers, the results can be overwhelming and there is a small overhead to the analysis. For more details, see [Influencers](/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs#ml-ad-influencers).
</admonition>

You can also configure your anomaly detection jobs to split a single time series into multiple time series based on a categorical field. For example, the `response_code_rates` job has a single detector that splits the data based on the `response.keyword` and then uses the `count` function to determine when the number of events is anomalous. You might use a job like this if you want to look at both high and low request rates partitioned by response code.
Let’s start by looking at the `response_code_rates` job in the **Anomaly Explorer**:
1. Select the **Jobs** tab in **Machine Learning** to see the list of your anomaly detection jobs.
2. Open the `response_code_rates` job in the Anomaly Explorer to view its results by clicking the corresponding icon in the row of the job.

For this particular job, you can choose to see separate swim lanes for each client IP or response code. For example:
![Anomaly explorer for response_code_rates job](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-job2-explorer.jpg)

Since the job uses `response.keyword` as its *partition field*, the analysis is segmented such that you have completely different baselines for each distinct value of that field. By looking at temporal patterns on a per entity basis, you might spot things that might have otherwise been hidden in the lumped view.
Under the anomaly timeline, there is a section that contains annotations. You can filter the type of events by using the selector on the right side of the **Annotations** section.
![Annotations section in the Anomaly Explorer](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-annotations.jpg)

On the left side of the **Anomaly Explorer**, there is a list of the top influencers for all of the detected anomalies in that same time period. The list includes maximum anomaly scores, which in this case are aggregated for each influencer, for each bucket, across all detectors. There is also a total sum of the anomaly scores for each influencer. You can use this list to help you narrow down the contributing factors and focus on the most anomalous entities.
Click on a section in the swim lanes to obtain more information about the anomalies in that time period. For example, click on the red section in the swim lane for the `response.keyword` value of `404`:
![Anomaly charts for the response_code_rates job](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-job2-explorer-anomaly.jpg)

You can see exact times when anomalies occurred. If there are multiple detectors or metrics in the job, you can see which caught the anomaly. You can also switch to viewing this time series in the **Single Metric Viewer** by clicking the **View Series** button in the **Actions** menu.
Below the charts, there is a table that provides more information, such as the typical and actual values and the influencers that contributed to the anomaly. For example:
![Anomaly tables for the response_code_rates job](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-job2-explorer-table.jpg)

If your job has multiple detectors, the table aggregates the anomalies to show the highest severity anomaly per detector and entity, which is the field value that is displayed in the **found for** column. To view all the anomalies without any aggregation, set the **Interval** to `Show all`.
In this sample data, the spike in the 404 response codes is influenced by a specific client. Situations like this might indicate that the client is accessing unusual pages or scanning your site to see if they can access unusual URLs. This anomalous behavior merits further investigation.
<tip>
  The anomaly scores that you see in each section of the **Anomaly Explorer** might differ slightly. This disparity occurs because for each job there are bucket results, influencer results, and record results. Anomaly scores are generated for each type of result. The anomaly timeline uses the bucket-level anomaly scores. The list of top influencers uses the influencer-level anomaly scores. The list of anomalies uses the record-level anomaly scores.
</tip>


### Population job results

The final sample job (`url_scanning`) is a *population anomaly detection job*. As we saw in the `response_code_rates` job results, there are some clients that seem to be accessing unusually high numbers of URLs. The `url_scanning` sample job provides another method for investigating that type of problem. It has a single detector that uses the `high_distinct_count` function on the `url.keyword` to detect unusually high numbers of distinct values in that field. It then analyzes whether that behavior differs over the population of clients, as defined by the `clientip` field.
If you examine the results from the `url_scanning` anomaly detection job in the **Anomaly Explorer**, you’ll notice its charts have a different format. For example:
![Anomaly charts for the url_scanning job](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-job3-explorer.jpg)

In this case, the metrics for each client IP are analyzed relative to other client IPs in each bucket and we can once again see that the `30.156.16.164` client IP is behaving abnormally.
If you want to play with another example of a population anomaly detection job, add the sample eCommerce orders data set. Its `high_sum_total_sales` job determines which customers have made unusual amounts of purchases relative to other customers in each bucket of time. In this example, there are anomalous events found for two customers:
![Anomaly charts for the high_sum_total_sales job](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-job4-explorer.jpg)


## Create forecasts

In addition to detecting anomalous behavior in your data, you can use the machine learning features to predict future behavior.
To create a forecast in Kibana:
1. View your job results (for example, for the `low_request_rate` job) in the **Single Metric Viewer**. To find that view, navigate to **Machine Learning** > **Single Metric Viewer** and select the job.
2. Click **Forecast**.
   ![Create a forecast from the Single Metric Viewer](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-forecast.png)
3. Specify a duration for your forecast. This value indicates how far to extrapolate beyond the last record that was processed. You must use [time units](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3028/reference/elasticsearch/rest-apis/api-conventions#time-units). In this example, the duration is one week (`1w`):
   ![Specify a duration of 1w](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-duration.png)
4. View the forecast in the **Single Metric Viewer**:
   ![View a forecast from the Single Metric Viewer](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-forecast-results.png)
   The yellow line in the chart represents the predicted data values. The shaded yellow area represents the bounds for the predicted values, which also gives an indication of the confidence of the predictions. Note that the bounds generally increase with time (that is to say, the confidence levels decrease), since you are forecasting further into the future. Eventually if the confidence levels are too low, the forecast stops.
5. Optional: Compare the forecast to actual data.
   ![View a forecast over actual data in the Single Metric Viewer](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/images/machine-learning-ml-gs-forecast-actual.png)

As the job processes more data, you can click the **Forecast** button again and choose to see one of your forecasts overlaid on the actual data. The chart then contains the actual data values, the bounds for the expected values, the anomalies, the forecast data values, and the bounds for the forecast. This combination of actual and forecast data gives you an indication of how well the machine learning features can extrapolate the future behavior of the data.
If you want to see this type of comparison for the Kibana sample data, which has a finite number of documents, you can reset the job and analyze only a subset of the data before you create a forecast. For example, reset one of your anomaly detection jobs from the **Job Management** page in Kibana or use the [reset anomaly detection jobs API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-reset-job). When you restart the datafeed for this job, choose a date part way through your sample data as the search end date. By default, the datafeed stops and the anomaly detection job closes when it reaches that date. Create the forecast. You can then restart the datafeed to process the remaining data and generate the type of results shown here.
<tip>
  The Kibana sample data sets have timestamps that are relative to when you added the data sets. However, some of these dates are in the future. Therefore, for the purposes of this tutorial, when you restart your datafeed do not use the **No end time (Real-time search)** option. Specify the appropriate end dates so that it processes all of the data immediately.
</tip>

Now that you have seen how easy it is to create forecasts with the sample data, consider what type of events you might want to predict in your own data. For more information and ideas, see [Forecast future behavior](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ml-ad-forecast).

## Next steps

By completing this tutorial, you’ve learned how you can detect anomalous behavior in a simple set of sample data. You created anomaly detection jobs in Kibana, which opens jobs and creates and starts datafeeds for you under the covers. You examined the results of the machine learning analysis in the **Single Metric Viewer** and **Anomaly Explorer** in Kibana. You also extrapolated the future behavior of a job by creating a forecast.
If you’re now thinking about where anomaly detection can be most impactful for your own data, there are three things to consider:
1. It must be time series data.
2. It should be information that contains key performance indicators for the health, security, or success of your business or system. The better you know the data, the quicker you will be able to create jobs that generate useful insights.
3. Ideally, the data is located in Elasticsearch and you can therefore create a datafeed that retrieves data in real time.  If your data is outside of Elasticsearch, you cannot use Kibana to create your jobs and you cannot use datafeeds.

In general, it is a good idea to start with single metric anomaly detection jobs for your key performance indicators. After you examine these simple analysis results, you will have a better idea of what the influencers might be. You can create multi-metric jobs and split the data or create more complex analysis functions as necessary. For examples of more complicated configuration options, see [Examples](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/anomaly-how-tos).
If you want to find more sample jobs, see [Supplied configurations](https://www.elastic.co/elastic/docs-builder/docs/3028/explore-analyze/machine-learning/anomaly-detection/ootb-ml-jobs). In particular, there are sample jobs for [Apache](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/machine-learning/ootb-ml-jobs-apache) and [Nginx](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/machine-learning/ootb-ml-jobs-nginx) that are quite similar to the examples in this tutorial.
If you encounter problems, we’re here to help. If you are an existing Elastic customer with a support contract, create a ticket in the [Elastic Support portal](http://support.elastic.co). Or post in the [Elastic forum](https://discuss.elastic.co/).