Transform and enrich data
Elastic Stack Serverless
For many use cases you'll want to enhance your data as it's being ingested into Elasticsearch. Your goal might be to filter out unwanted parts of the data before it gets indexed. Another goal might be to reshape the data, such as to process incoming log files to conform to a standard format.
You might also want to enrich your data with additional information, for example to append product information based on product IDs or vendor information based on a set of known IP addresses.
According to your use case, you may want to control the structure of your ingested data by customizing how Elasticsearch maps an incoming document to fields and data types.
Finally, to help ensure optimal query results, you may want to customize how text is analyzed and how text fields are defined inside Elasticsearch.
- Elastic Agent processors
- You can use Elastic Agent processors to sanitize or enrich raw data at the source. Use Elastic Agent processors if you need to control what data is sent across the wire, or if you need to enrich the raw data with information available on the host.
- Elasticsearch ingest pipelines
-
You can use Elasticsearch ingest pipelines to enrich incoming data or normalize field data before the data is indexed. Elasticsearch ingest pipelines enable you to manipulate the data as it comes in. This approach helps you avoid adding processing overhead to the hosts from which you’re collecting data.
-
When you define a pipeline, you can configure one or more processors to operate on the incoming data. A typical use case is to transform specific strings to lowercase, or to sort the elements of incoming arrays into a given order. This section describes:
- How to create, view, edit, and delete an ingest pipeline
- How to set up processors to transform the data
- How to test a pipeline before putting it into production.
-
You can try out the Parse logs example which shows you how to set up in ingest pipeline to transform incoming server logs into a standard format.
- The Elasticsearch enrich processor enables you to add data from existing indices to your incoming data, based on an enrich policy. The enrich policy contains a set of rules to match incoming documents to the fields containing the data to add. Refer to Data enrichment to learn how to set up an enrich processor. You can also try out a few examples that show how to enrich data based on geographic location, exact values such as email addresses or IDs, or a range of values such as a date or set of IP addresses.
- Logstash and the Logstash
elastic_integration filter
-
If you're using Logstash as your primary ingest tool, you can take advantage of its built-in pipeline capabilities to transform your data. You configure a pipeline by stringing together a series of input, output, filtering, and optional codec plugins to manipulate all incoming data.
- If you're ingesting using Elastic Agent with Elastic Integrations, you can use the Logstash
elastic_integration filter
and other Logstash filters to extend Elastic integrations by transforming data before it goes to Elasticsearch. - Index mapping
-
Index mapping lets you control the structure that incoming data has within an Elasticsearch index. You can define all of the fields that are included in the index and their respective data types. For example, you can set fields for dates, numbers, or geolocations, and define the fields to have specific formats.
-
Ingested data can be mapped dynamically, where Elasticsearch adds all fields automatically based on the detected data types, or explicitly, where Elasticsearch maps the incoming data to fields based on your custom rules.
-
You can use Elasticsearch runtime fields to define or alter the schema at query time. You can start working with your data without needing to understand how it is structured, add fields to existing documents without reindexing your data, override the value returned from an indexed field, and/or define fields for a specific use without modifying the underlying schema.
- Refer to the Index mapping pages to learn about the dynamic mapping rules that Elasticsearch runs by default, which ones you can customize, and how to configure your own explicit data to field mappings.
- Text analysis
-
Like index mapping, text analysis is another form of data transformation that runs on data as it's being ingested. This process analyzes incoming, unstructured text and organizes it in a way to ensure that all relevant documents are matched for a given text query, and not just exact string matches.
- Refer to the Text analysis pages to learn how to configure an analyzer to run on incoming text. You can opt to use one of several built-in analyzers, or create a custom analyzer for specific use cases.