Regular expression syntax
A regular expression is a way to match patterns in data using placeholder characters, called operators.
Elasticsearch supports regular expressions in the following queries:
Elasticsearch uses Apache Lucene's regular expression engine to parse these queries.
Lucene’s regular expression engine supports all Unicode characters. However, the following characters are reserved as operators:
. ? + * | { } [ ] ( ) " \
Depending on the optional operators enabled, the following characters may also be reserved:
# @ & < > ~
To use one of these characters literally, escape it with a preceding backslash or surround it with double quotes. For example:
\@ 1
\\ 2
"john@smith.com" 3
- renders as a literal '@'
- renders as a literal '\'
- renders as 'john@smith.com'
The backslash is an escape character in both JSON strings and regular expressions. You need to escape both backslashes in a query, unless you use a language client, which takes care of this. For example, the string a\b
needs to be indexed as "a\\b"
:
PUT my-index-000001/_doc/1
{
"my_field": "a\\b"
}
This document matches the following regexp
query:
GET my-index-000001/_search
{
"query": {
"regexp": {
"my_field.keyword": "a\\\\.*"
}
}
}
Lucene’s regular expression engine does not use the Perl Compatible Regular Expressions (PCRE) library, but it does support the following standard operators.
.
- Matches any character. For example:
ab. 1
- matches 'aba', 'abb', 'abz', etc.
?
- Repeat the preceding character zero or one times. Often used to make the preceding character optional. For example:
abc? 1
- matches 'ab' and 'abc'
+
- Repeat the preceding character one or more times. For example:
ab+ 1
- matches 'ab', 'abb', 'abbb', etc.
*
- Repeat the preceding character zero or more times. For example:
ab* 1
- matches 'a', 'ab', 'abb', 'abbb', etc.
{}
- Minimum and maximum number of times the preceding character can repeat. For example:
a{{2}} 1
a{2,4} 2
a{2,} 3
- matches 'aa'
- matches 'aa', 'aaa', and 'aaaa'
- matches 'a` repeated two or more times
|
- OR operator. The match will succeed if the longest pattern on either the left side OR the right side matches. For example:
abc|xyz 1
- matches 'abc' and 'xyz'
( … )
- Forms a group. You can use a group to treat part of the expression as a single character. For example:
abc(def)? 1
- matches 'abc' and 'abcdef' but not 'abcd'
[ … ]
- Match one of the characters in the brackets. For example:
[abc] 1
- matches 'a', 'b', 'c'
Inside the brackets, -
indicates a range unless -
is the first character or escaped. For example:
[a-c] 1
[-abc] 2
[abc\-]3
- matches 'a', 'b', or 'c'
- '-' is first character. Matches '-', 'a', 'b', or 'c'
- Escapes '-'. Matches 'a', 'b', 'c', or '-'
A ^
before a character in the brackets negates the character or range. For example:
[^abc] 1
[^a-c] 2
[^-abc] 3
[^abc\-] 4
- matches any character except 'a', 'b', or 'c'
- matches any character except 'a', 'b', or 'c'
- matches any character except '-', 'a', 'b', or 'c'
- matches any character except 'a', 'b', 'c', or '-'
You can use the flags
parameter to enable more optional operators for Lucene’s regular expression engine.
To enable multiple operators, use a |
separator. For example, a flags
value of COMPLEMENT|INTERVAL
enables the COMPLEMENT
and INTERVAL
operators.
ALL
(Default)- Enables all optional operators.
""
(empty string)- Alias for the
ALL
value. COMPLEMENT
- Enables the
~
operator. You can use~
to negate the shortest following pattern. For example:
a~bc 1
- matches 'adc' and 'aec' but not 'abc'
EMPTY
- Enables the
#
(empty language) operator. The#
operator doesn’t match any string, not even an empty string.
If you create regular expressions by programmatically combining values, you can pass #
to specify "no string." This lets you avoid accidentally matching empty strings or other unwanted strings. For example:
#|abc 1
- matches 'abc' but nothing else, not even an empty string
INTERVAL
- Enables the
<>
operators. You can use<>
to match a numeric range. For example:
foo<1-100> 1
foo<01-100> 2
- matches 'foo1', 'foo2' ... 'foo99', 'foo100'
- matches 'foo01', 'foo02' ... 'foo99', 'foo100'
INTERSECTION
- Enables the
&
operator, which acts as an AND operator. The match will succeed if patterns on both the left side AND the right side matches. For example:
aaa.+&.+bbb 1
- matches 'aaabbb'
ANYSTRING
- Enables the
@
operator. You can use@
to match any entire string.
You can combine the @
operator with &
and ~
operators to create an "everything except" logic. For example:
@&~(abc.+) 1
- matches everything except terms beginning with 'abc'
NONE
- Disables all optional operators.
Lucene’s regular expression engine does not support anchor operators, such as ^
(beginning of line) or $
(end of line). To match a term, the regular expression must match the entire string.