Parsing expressions

Choice expression

If p and q are parsing‑expressions , so is p | q , a choice expression, that matches p or q in left-to-right order. If p matches, q is not tried at all. If p does not match, then the parser backtracks and q is tried to match.

Sequence expression

If p and q are parsing‑expressions , so is p q , a sequence expression, that matches p followed by q in left-to-right order.

Difference expression

If p and q are parsing‑expressions , so is p - q , a difference expression. The difference expression matches, if p matches but q does not match.

List expression

If p and q are parsing‑expressions , so is p % q , a list expression. The list expression matches 1...N occurrences of p separated by the occurrences of q . The expression p % q has the same meaning as the expression p ( q p ) * .

Lookahead expression

if p is a parsing‑expression , so is & p , a lookahead expression. A lookahead expression matches expression p without consuming any input.

Postfix expressions

If p is a parsing‑expressions , so are p * , p + and p ? , postfix‑expressions . Expression p * matches 0...N occurrences of p , p + matches 1...N occurrences of p , and p ? matches 0...1 occurrences of p .

Primary expressions

A primary expression consists either a call of a rule, a primitive expression, or a grouping expression. A primary expression can contain an expectation operator and semantic actions.

Rule call

If r is a name of an existing parsing‑rule , expressions r : i , that exists in the body of some parsing rule, will match the body of r . We might say that parsing rule r is called at the position of r . Similarly expression r ( a 0 , a 1 , ... a n ) : i calls the rule r with arguments a 0 , a 1 , ..., a n . The identifier i is an instance‑identifier that is bound to an instance of rule r . You can use any name as an instance identifier as long as it is unique within the body of a rule and conforms to the the syntax of an identifier.

Primitive expressions

These primitive expressions are atomic parsing expressions:

empty expression matches an empty string without consuming any input.

any expression will match any single token. It "eats" the token, meaning it advances the lexer to the next input token.

A token‑name expression matches a named token and advances the lexer to the next input token. A nonterminal , that is: the name of a parsing rule, and the named token are distinguished between each other, because a colon and an identifier is appended to the parsing rule name in a rule call , but a single identifier without the colon and instance name represents a token name expression.

The char‑literal and the string‑literal expressions are used in scannerless parsers, or parsers whose lexer is defined to be soul::lexer::trivial::TrivialLexer<char32_t> . The xml parser defined in the C:\soul-5.1.0\soul\xml\xml_parser directory is an example of a scannerless parser.

Grouping expression

If p is a parsing‑expression , so is ( p ) . When a grouping expression is matched, parsing expression p is matched.

Expectation expression

If a primary expression contains an exclamation symbol after a rule call expression, primitive expression or grouping expression, and current input does not match that expresssion, a parsing exception is thrown.

Semantic actions

If a C++ compound statement is attached as a trailing element of a rule call expression, a primitive expression or a grouping expression, and current input matches that expression, that compound statement, a success action , gets executed. Typically the success action returns a value. That value is bound to the instance‑identifier of the rule in a rule call. A semantic action can also assign value to a local variable of a parsing rule.

If a slash and a C++ compound statement is appended to a success action as a trailing element, and current input does not match a rule call expression, a primitive expression or a grouping expression, that compound statement, a failure action , gets executed.

Parameters and variables available in semantic actions

Getting the lexeme of the matched token:

parser IntegerParser
{
    Integer
        ::= INTEGER{ std::u32string s = lexer.GetToken(pos).ToString(); /* do something with s */ }
        ;
}