User agent processor
The user_agent processor extracts details from the user agent string a browser sends with its web requests. This processor adds this information by default under the user_agent field.
The user-agent module ships by default with the regexes.yaml made available by uap-java with an Apache 2.0 license. For more details see https://github.com/ua-parser/uap-core.
| Name | Required | Default | Description |
|---|---|---|---|
field |
yes | - | The field containing the user agent string. |
target_field |
no | user_agent | The field that will be filled with the user agent details. |
regex_file |
no | - | The name of the file in the config/user-agent* directory containing the regular expressions for parsing the user agent string. Both the directory and the file have to be created before starting Elasticsearch. If not specified, the user-agent module uses the regexes.yaml file from the uap-core package that it ships with (see below). * Before version 9.4, this directory was named config/ingest-user-agent. |
properties |
no | [name, os, device, original, version] |
Controls what properties are added to target_field. |
extract_device_type |
no | false |
|
ignore_missing |
no | false |
If true and field does not exist, the processor quietly exits without modifying the document |
Here is an example that adds the user agent details to the user_agent field based on the agent field:
PUT _ingest/pipeline/user_agent
{
"description" : "Add user agent information",
"processors" : [
{
"user_agent" : {
"field" : "agent"
}
}
]
}
PUT my-index-000001/_doc/my_id?pipeline=user_agent
{
"agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
}
GET my-index-000001/_doc/my_id
Which returns
{
"found": true,
"_index": "my-index-000001",
"_id": "my_id",
"_version": 1,
"_seq_no": 22,
"_primary_term": 1,
"_source": {
"agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
"user_agent": {
"name": "Chrome",
"original": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
"version": "51.0.2704.103",
"os": {
"name": "Mac OS X",
"version": "10.10.5",
"full": "Mac OS X 10.10.5"
},
"device" : {
"name" : "Mac"
}
}
}
}
To use a custom regex file for parsing the user agents, that file has to be put into the config/user-agent directory and has to have a .yml filename extension. The file has to be present at node startup, any changes to it or any new files added while the node is running will not have any effect.
Before version 9.4, this directory was named config/ingest-user-agent. The old directory name is still supported as a fallback but is deprecated.
In practice, it will make most sense for any custom regex file to be a variant of the default file, either a more recent version or a customised version.
The default file included in the user-agent module is regexes.yaml from the uap-core package: https://github.com/ua-parser/uap-core/blob/master/regexes.yaml
The user_agent processor supports the following settings:
user_agent.cache_size- The maximum number of results that should be cached. Defaults to
1000. ingest.user_agent.cache_size-
Deprecated in 9.4
Use
user_agent.cache_sizeinstead.The maximum number of results that should be cached. Defaults to
1000.
Note that these settings are node settings and apply to all user_agent processors, i.e. there is one cache for all defined user_agent processors.