TSDS guidelines
This page describes how to enable TSDS functionality in your integration packages. Full details about TSDS can be found in Time series data stream in the Elasticsearch documentation.
In this document you can find:
A time series is a sequence of observations for a specific entity. TSDS enables the column-oriented functionality in elasticsearch by co-locating the data and optimizing the storage and aggregations to take advantage of such co-allocation.
Integrations are one of the biggest sources of input data to Elasticsearch. Enabling TSDS on integration packages can be achieved by minimal changes made in the fields.yml
and manifest.yml
files of a package.
Datastreams having type logs
are excluded from TSDS migration.
Each field belonging to the set of fields that uniquely identify a document is a dimension. For more details, refer to Dimensions.
To set a field as a dimension simply add dimension: true
to its mapping:
- name: ApiId
type: keyword
dimension: true
A field having type flattened cannot be selected as a dimension field. If the field that you are choosing as a dimension is too long or is of type flattened, consider hashing the value of this field and using the result as a dimension. Fingerprint processor can be used for this purpose.
You can find an example in Oracle Integration TSDS Enablement Example
Important considerations:
There is a limit on how many dimension fields a datastream can have. By default, this value is
21
. You can adjust this restriction by altering theindex.mapping.dimension_fields.limit
:elasticsearch: index_template: settings: index.mapping.dimension_fields.limit: 321
- Defaults to 21
Dimension keys have a hard limit of 512b. Documents are rejected if this limit is reached.
Dimension values have a hard limit of 1024b. Documents are rejected if this limit is reached.
There are fields that are part of every package, and they are potential candidates for becoming dimension fields:
host.name
service.address
agent.id
container.id
For products that are capable of running both on-premise and in a public cloud environment (by being deployed on public cloud virtual machines), it is recommended to annotate the ECS fields listed below as dimension fields:
host.name
service.address
container.id
cloud.account.id
cloud.provider
cloud.region
cloud.availability_zone
agent.id
cloud.instance.id
For products operating as managed services within cloud providers like AWS, Azure, and GCP, it is advised to label the fields listed below as dimension fields:
cloud.account.id
cloud.region
cloud.availability_zone
cloud.provider
agent.id
Note that for some packages some of these fields do not hold any value, so make sure to only use the needed ones.
The files.yml
file has the field mappings specific to a datastream of an integration. Some of these fields might need to be set as a dimension if the set of dimension fields in ECS is not enough to create a unique _tsid
.
Adding an inline comment prior to the dimension annotation is advised, detailing the rationale behind the choice of a particular field as a dimension field:
- name: wait_class
type: keyword
# Multiple events are generated based on the values of wait_class. Hence, it is a dimension
dimension: true
description: Every wait event belongs to a class of wait events.
Metrics are fields that contain numeric measurements, as well as aggregations and/or down sampling values based off of those measurements. Annotate each metric with the correct metric type. The currently supported values are gauge
, counter
, and null
.
Example of adding a metric type to a field:
- name: compactions_failed
type: double
metric_type: counter
description: |
Counter of TSM compactions by level that have failed due to error.
Some of the aggregation functions are not supported for certain metric_type
values. In such a scenario, please revisit to see if the selection of metric_type
you made is indeed correct for that field. If valid, please create an issue in elastic/elasticsearch explaining the use case.
Modify the kibana.version
to at least 8.8.0
in the manifest.yml
file of the package:
conditions:
kibana.version: "^8.8.0"
Add the changes to the manifest.yml
file of the datastream as shown to enable the timeseries index mode:
elasticsearch:
index_mode: "time_series"
- If the number of dimensions is insufficient, we will have loss of data. Consider testing this using the TSDS migration test kit.
- Verify the dashboard is rendering the data properly. If certain visualisation do not work, consider migrating to Lens. Remember that certain aggregation functions are not supported when a field has metric type
counter
, for example,avg()
. Replace such aggregation functions with a supported aggregation type such asmax()
ormin()
.
- Use Lens as the preferred visualisation type.
- Always assess the number of unique values the field that is selected to be a dimension would hold, especially if it is a numeric field. A field that holds millions of unique values may not be an ideal candidate for becoming a dimension field.
- If the dimension field value length is very long (max limit is 1024B), consider transforming the value to hash value representation. Fingerprint processor can be used for this purpose.
- In the field mapping files above each dimension field, add in-line comments stating the reason for selecting the field as a dimension field.
- As part of TSDS migration testing, you may discover other errors which may be unrelated to TSDS migration. Keep the pull request for TSDS migration free from such changes. This helps in obtaining quick PR approval.
In the event that after enabling TSDS you notice that metrics data is being dropped from an index, the TSDS test migration kit can be used as a helpful debugging tool.
Fields having conflicting field type will not be considered as dimension. Resolve the field type ambiguity before defining a field as dimension field.
When mappings are modified for a datastream, index rollover happens and a new index is created under the datastream. Even if there exists a new index, the data continues to go to the old index until the timestamp matches index.time_series.start_time
of the newly created index.
An enhancement request for Kibana is created to indicate the write index. Until then, refer to the index.time_series.start_time of indices and compare with the current time to identify the write index.
If you find this error (for reference, see integrations issue #7345 and elasticsearch PR #98518):
... (status=400): {"type":"illegal_argument_exception","reason":"the document timestamp [2023-08-07T00:00:00.000Z] is outside of ranges of currently writable indices [[2023-08-07T08:55:38.000Z,2023-08-07T12:55:38.000Z]]"}, dropping event!
Consider:
Defining the
look_ahead
orlook_back_time
for each data stream. For example:elasticsearch: index_mode: "time_series" index_template: settings: index.look_ahead_time: "10h"
NoteUpdating the package with this does not cause an automatic rollover on the data stream. You have to do that manually.
Updating the
timestamp
of the document being rejected.Finding a fix to receive the document without a delay.