A regular expression is used for matching input strings. It consists of primary expressions and expressions combined with regular operators '|', catenation, '*', '+', and '?'. The whole regular expression is enclosed in double quotes.
regular‑expression | → | " alternative " |
Expression a | b matches input strings that match either expression a or expression b:
alternative | → | catenation (| catenation)* |
Expression ab matches input strings that match a followed by input strings that match b:
catenation | → | repetition repetition* |
Expression a* match input strings that match a zero or more times. Expression a+ match input strings that match a one or more times. Expression a? match input strings that match a zero or one time:
repetition | → | primary (* | + | ?)? |
A primary expression is either a parenthesized expression, an ordinary character or an escape sequence, a character class, an expression reference or a dot expression. A dot expression matches any single character.
primary | → | ( alternative ) | char | class | expression‑reference | . |
An ordinary character is any other character than a newline, a carriage return, a backslash, an operator or punctuation character, or a double quote:
char | → | [ ^ \n\r\\{}()[\]|*+?." ] | escape |
An escape sequence starts with the backslash character. It removes the operator meaning from an operator symbol, that is: the symbol is matched literally. It is also used for representing non-printable characters, and to match parentheses, brackets and braces literally.
A character class consists of characters and ranges of characters enclosed in brackets. It may also contain escape sequences. A character class matches a single input symbol that match enumerated characters and ranges of characters. If the class starts with the caret symbol, the class matches any other symbols than the enumerated characters and ranges.
class | → | [ ^? range* ] |
range | → | class‑char (- class‑char)? |
class‑char | → | char | [ ()[{}|*+?.^- ] |
Parentheses, the left bracket, braces, and the operator symbols need not be escaped within a character class, although it would do no harm to do so.
An identifier in braces is used to reference a named regular expression that comes before the regular expression defined currently:
expression‑reference | → | { identifier } |