Appendix C Regular Expression Syntax

A regular expression is used for matching input strings. It consists of primary expressions and expressions combined with regular operators '|', catenation, '*', '+', and '?'. The whole regular expression is enclosed in double quotes.

regular‑expression " alternative "

Expression a | b matches input strings that match either expression a or expression b:

alternative catenation (| catenation)*

Expression ab matches input strings that match a followed by input strings that match b:

catenation repetition repetition*

Expression a* match input strings that match a zero or more times. Expression a+ match input strings that match a one or more times. Expression a? match input strings that match a zero or one time:

repetition primary (* | + | ?)?

A primary expression is either a parenthesized expression, an ordinary character or an escape sequence, a character class, an expression reference or a dot expression. A dot expression matches any single character.

primary ( alternative ) | char | class | expression‑reference | .
Ordinary characters

An ordinary character is any other character than a newline, a carriage return, a backslash, an operator or punctuation character, or a double quote:

char [ ^ \n\r\\{}()[\]|*+?." ] | escape
Escape sequences

An escape sequence starts with the backslash character. It removes the operator meaning from an operator symbol, that is: the symbol is matched literally. It is also used for representing non-printable characters, and to match parentheses, brackets and braces literally.

Character classes

A character class consists of characters and ranges of characters enclosed in brackets. It may also contain escape sequences. A character class matches a single input symbol that match enumerated characters and ranges of characters. If the class starts with the caret symbol, the class matches any other symbols than the enumerated characters and ranges.

class [ ^? range* ]
range class‑char (- class‑char)?
class‑char char | [ ()[{}|*+?.^- ]

Parentheses, the left bracket, braces, and the operator symbols need not be escaped within a character class, although it would do no harm to do so.

Expression references

An identifier in braces is used to reference a named regular expression that comes before the regular expression defined currently:

expression‑reference { identifier }