4 Syntax of Parsing Files

Table of contents

4.1 Parser File Syntax
      4.1.1 Parser File Declaration
      4.1.2 Parser Declaration
      4.1.3 Parser Statements
      4.1.4 Lexer Statement
      4.1.5 Main Statement
      4.1.6 Using Statement
      4.1.7 Rule Statement
4.2 Parsing Expressions
      4.2.1 Choice
      4.2.2 Sequence
      4.2.3 Difference
      4.2.4 List
      4.2.5 Prefix
      4.2.6 Postfix
      4.2.7 Compound
      4.2.8 Primary
4.3 Project File Syntax

4.1 Parser File Syntax

4.1.1 Parser File Declaration

parser‑file parser‑module‑declaration imports parser‑declaration *
parser‑module‑declaration export‑module‑declaration

A parser file consists of a parser module declaration followed by module imports followed by parser declarations.

Each parser file must have a unique parser module identifier in the parser module declaration.

Modules that must be imported include:

The parser generator will generate a C++ module for each parser file. The module will be in two C++ source files:

A module import may have two kinds of prefixes:

If a module import has an interface prefix, or it has no prefix, the parser generator will place the import to the generated module interface unit, to the generated .ixx file. If a module import has an implementation prefix, the parser generator will place the import to the generated module implementation unit, to the generated .cpp file.

Imports for used token, lexer and parser modules can go to the module implementation unit. If a C++ type is used as a parameter type in a rule , the module import for the module containing that C++ type is required to be placed to the module interface unit. If a C++ type is used as a return value type or in the implementation of a semantic action of a rule, the import can be placed to the module implementation unit. In other cases the import can be placed to the module implementation unit.

Example

export module soul.xml.xpath.parser;

[interface]import soul.xml.xpath.expr;
[implementation]import soul.xml.xpath.lexer;
[implementation]import soul.xml.xpath.token;
[implementation]import soul.xml.xpath.token.parser;

parser XPathParser
{
    lexer soul::xml::xpath::lexer::XPathLexer<char32_t>;
    main;
    ...
}

4.1.2 Parser Declaration

parser‑declaration parser parser‑name { parser‑statement * }
parser‑name identifier

A parser declaration consists of the keyword parser followed by parser name followed by a sequence of parser statements enclosed in braces.

Parser name must be unique within a parser project.

Generated Class Template

The parser generator will generate a C++ class template for each parser declaration. The class template is parameterized with the type of the lexer.

// generated xpath_parser.ixx:

template<typename Lexer>
struct XPathParser
{
    static std::unique_ptr<soul::xml::xpath::expr::Expr> Parse(Lexer& lexer);
    static soul::parser::Match Expr(Lexer& lexer);
    ...
}

4.1.3 Parser Statements

parser‑statement lexer‑statement | main‑statement | using‑statement | rule‑statement

A parser statement can be either a lexer statement, a main statement, a using statement or a rule statement.

4.1.4 Lexer Statement

lexer‑statement lexer type‑id ;

A lexer statement instantiates the parser with the given lexer. There must be at least one lexer statement for a parser, but there may be more than one.

When there is more than one lexer, each lexer must produce those tokens that the rules of the parser consume.

Example

// common_parser.parser:
...
parser CommonParser
{
    lexer soul::lex::slg::SlgLexer<char32_t>;
    lexer soul::lex::spg::SpgLexer<char32_t>; 
    ...
}

Explicit Instantiation for the Lexer Type

The parser class template is explicitly instantiated at the end of the <parser‑file‑name> .cpp file for each concrete lexer type declared in the lexer‑statement .

// generated common_parser.cpp:
...
template struct CommonParser<soul::lexer::Lexer<soul::lex::slg::SlgLexer<char32_t>, char32_t>>;
template struct CommonParser<soul::lexer::Lexer<soul::lex::spg::SpgLexer<char32_t>, char32_t>>;
...

4.1.5 Main Statement

main‑statement main ;

The main statement declares that the parser is a "main" parser. There may be more than one main parsers for a given parser project. The parser generator generates a Parse-function that acts as the parser entry point for the main parser.

Example

// xpath_parser.parser:
...
parser XPathParser
{
    lexer soul::xml::xpath::lexer::XPathLexer<char32_t>;
    main;

    Expr : soul::xml::xpath::expr::Expr*
        ::= 
        (
            OrExpr:orExpr
        )
        { 
            return orExpr; 
        }
        ;
    ...
}

Generated Parser Interface

The Parse-function takes a lexer as the first parameter and then the same parameters as the first rule of the parser. It returns the value that the first rule returns with the exception that if the first rule returns a pointer type Foo* , the Parse-function will return a std::unique_ptr<Foo> .

// generated xpath_parser.ixx:

template<typename Lexer>
struct XPathParser
{
    static std::unique_ptr<soul::xml::xpath::expr::Expr> Parse(Lexer& lexer);
    static soul::parser::Match Expr(Lexer& lexer);
    ...
}

Generated Parse-function Implementation

The implementation of the Parse-function is to advance the lexer to point to the first token of the input and then to call the first rule of the parser with the lexer.

// generated xpath_parser.cpp:

namespace soul::xml::xpath::parser {

template<typename Lexer>
std::unique_ptr<soul::xml::xpath::expr::Expr> XPathParser<Lexer>::Parse(Lexer& lexer)
{
    ...
    ++lexer;
    soul::parser::Match match = XPathParser<Lexer>::Expr(lexer);
    ...
}
...

4.1.6 Using Statement

using‑statement using parser‑and‑rule‑name ;
parser‑and‑rule‑name qualified‑id

The using statement takes form using <parser> . <rule> ; . The using statement imports a rule from another parser to the scope of the current parser so that a rule or rules of the current parser can use (that is: to "call") the imported rule. In addition to adding using statements, the modules that contain the used parsers must be imported at the start of the parser file.

Example

// statement_parser.parser:

export module soul.cpp.statement.parser;

[interface]import soul.ast.cpp;
[implementation]import soul.cpp.token;
[implementation]import soul.cpp.op.token;
[implementation]import soul.punctuation.token;
[implementation]import soul.cpp.declaration.parser;
[implementation]import soul.cpp.declarator.parser;
[implementation]import soul.cpp.expression.parser;
[implementation]import soul.lex.slg;
[implementation]import soul.lex.spg;

parser StatementParser
{
    lexer soul::lex::slg::SlgLexer<char32_t>;
    lexer soul::lex::spg::SpgLexer<char32_t>;

    using DeclarationParser.SimpleDeclaration;
    using DeclarationParser.BlockDeclaration;
    using DeclarationParser.DeclSpecifierSeq;
    using DeclaratorParser.TypeId;
    using DeclaratorParser.TypeSpecifierSeq;
    using DeclaratorParser.Declarator;
    using DeclaratorParser.AbstractDeclarator;
    using ExpressionParser.Expression;
    using ExpressionParser.ConstantExpression;
    using ExpressionParser.AssignmentExpression;
    ...
}

4.1.7 Rule Statement

rule‑statement rule‑header ::= rule‑body ;
rule‑header rule‑name parameters‑and‑variables ? return‑value ?
rule‑name identifier
parameters‑and‑variables ( ( param‑or‑variable ( , param‑or‑variable )* )? )
param‑or‑variable variable | parameter
variable var variable‑type variable‑declarator
variable‑type type‑id
variable‑declarator declarator
parameter parameter‑type parameter‑declarator
parameter‑type type‑id
parameter‑declarator declarator
return‑value : type‑id
rule‑body choice

The rule statement declares a parsing rule. It consists of the rule header followed by the ::= symbol (pronounced 'produces') followed by the rule body and terminated by the semicolon.

The rule header consists of the rule name followed by an optional list of parameters and variables followed by an optional return value.

A rule must have a unique name within a parser.

A variable declaration begins with the keyword var followed by the C++ type and declarator of the variable.

A parameter declaration lacks the var keyword and is otherwise similar to the variable declaration.

If the rule returns a value, the return value declaration begins with the : symbol and is followed by a C++ type.

The rule body consists of a 'choice' parsing expression.

Example

// rules of statement_parser.parser:

parser StatementParser
{
    ...
    
    Statement : soul::ast::cpp::StatementNode*
        ::= ...
        ;
        
    LabeledStatement(var std::string label) : soul::ast::cpp::StatementNode*
        ::= ...
        ;
        
    EmptyStatement : soul::ast::cpp::StatementNode*
        ::= ...
        ;
    
    ...
}

Generated Rule Functions

The parser generator will generate a static function for each rule to the class template of the parser.

// generated statement_parser.ixx:
...
template<typename Lexer>
struct StatementParser
{
    static soul::parser::Match Statement(Lexer& lexer);
    static soul::parser::Match LabeledStatement(Lexer& lexer);
    static soul::parser::Match EmptyStatement(Lexer& lexer);
    ...
}
...

4.2 Parsing Expressions

The body of a rule consists of a combination of parsing expressions :

These are the main categories of parsing expressions:

The parser generator will generate a C++ compound statement to the rule function for each kind of parsing expression. These compound statements are called component parsers .

4.2.1 Choice

choice sequence ( | sequence )*

A 'choice' parsing expression consists of a nonempty sequence of 'sequence' parsing expressions separated by the | symbol. The 'sequence' parsing expressions are also called the 'choices'.

The generated component parser matches input to the choices starting with the leftmost choice and proceeding to the right. The parser always accepts the first matching choice without trying to match the choices following the first matching choice at all. In the beginning of each matching, the parser rewinds the input to the position where it was when matching the first choice. If none of the choices match, the parser backtracks the input to the starting position of the first choice and returns failure to the parent component parser.

Example

// a 'choice':
...
parser StatementParser
{
    ...
    Statement : soul::ast::cpp::StatementNode*
        ::= LabeledStatement:labeledStatement{ return labeledStatement; }
        |   EmptyStatement:emptyStatement{ return emptyStatement; }
        |   CompoundStatement:compoundStatement{ return compoundStatement; }
        |   SelectionStatement:selectionStatement{ return selectionStatement; }
        |   IterationStatement:iterationStatement{ return iterationStatement; }
        |   JumpStatement:jumpStatement{ return jumpStatement; }
        |   DeclarationStatement:declarationStatement{ return declarationStatement; }
        |   TryStatement:tryStatement{ return tryStatement; }
        |   ExpressionStatement:expressionStatement{ return expressionStatement; }  
        ;
    ...
}

4.2.2 Sequence

sequence difference difference *

A 'sequence' parsing expression consists of a nonempty sequence of 'difference' parsing expressions. Let's call the 'difference' parsing expressions components c 1 , c 2 , ..., c n , for n≥1.

The generated component parser matches input to the each of the components c i in sequence for all i=1...n. If all the components match, the parser accepts a string of all of the matches catenated together, otherwise the parser backtracts the input to the starting position of c 1 and returns failure to the parent component parser.

Example

// a 'sequence':
...
parser StatementParser
{
    ...
    WhileStatement : soul::ast::cpp::StatementNode*
        ::= 
        (
            WHILE LPAREN! Condition:cond! RPAREN! Statement:stmt!
        )
        {
            return new soul::ast::cpp::WhileStatementNode(lexer.GetSourcePos(pos), cond, stmt);
        }
        ;
    ...
}

4.2.3 Difference

difference list ( - list )*

A 'difference' parsing expression consists of a nonempty sequence of 'list' parsing expressions separated by the - , the dash, symbol.

For a difference parser a - b , the generated component parser will first match input to the component parser a . If a matches, the parser will backtrack the input and try to match b . If b does not match, the generated parser will then backtrack the input to the position where the lexer was at the end of matching a and return success to the parent component parser. Otherwise, if both a and b match, or a does not match, the generated parser will backtrack the input to the starting position of a and return failure to the parent component parser. Informally: the operation of the difference parser a - b is: "match a but not b ".

Example

// a 'difference':
...
parser XPathParser
{
    ...
    PathExpr(var std::unique_ptr<soul::xml::xpath::expr::Expr> expr) : soul::xml::xpath::expr::Expr*
        ::= 
        (
            (
                LocationPath:locationPath - FunctionCall:functionCall
            )
            { 
                return locationPath; 
            }
        )
    ...
}

4.2.4 List

list prefix ( % prefix )?

A 'list' parsing expression consists of a 'prefix' parsing expression optionally followed by the % symbol and another 'prefix' parsing expression.

A 'list' parsing expression a % b is a shorthand notation for parsing expression a ( b a )*, that is: one or more a 's separated by b 's.

Example

// a 'list':

parser ExpressionParser
{
    ...
    ExpressionList(soul::ast::cpp::Node* owner)
        ::= AssignmentExpression:expr{ owner->Add(expr); } % COMMA
        ;
    ...
}

4.2.5 Prefix

prefix lookahead | postfix
lookahead & postfix

A 'prefix' parsing expression consists of either a 'lookahead' or a 'postfix' parsing expression.

A 'lookahead' parsing expression consists of the & symbol followed by a 'postfix' parsing expression.

For a lookahead parser & a , the generated component parser will try to match a . If a matches, the generated parser will backtrack to the starting position of a and return success to the parent component parser. Otherwise the generated parser will backtrack to the starting position of a and return failure to the parent component parser. This allows arbitrary lookahead without consuming input.

Example

// a 'lookahead':

parser TemplateParser
{
    ...
    TemplateArgument(sngcpp::symbols::Context* ctx, sngcpp::ast::Node* templateIdNode, int index) : Node*
        ::= ... TypeId(ctx):typeId TemplateArgNext:next
            ...
        |   ConstantExpression(ctx):constantExpr{ return constantExpr; }
        |   IdExpression(ctx):idExpr{ return idExpr; }
        ;

    TemplateArgNext
        ::= &(RANGLE | COMMA | ELLIPSIS)
        ;
    ...
}

4.2.6 Postfix

postfix compound ( * | + | ? )?

A 'postfix' parsing expression consists of a 'compound' parsing expression optionally followed by one of the symbols * , + or ? .

A 'compound' parsing expression followed by the * symbol forms a 'kleene-star' parsing expression.

A 'compound' parsing expression followed by the + symbol forms a 'positive' parsing expression.

A 'compound' parsing expression followed by the ? symbol forms am 'optional' parsing expression.

For a 'kleene-star' parser a * , the generated component parser will try to match a as many times as it matches. Then it will backtrack the input to the position where it was at the end of last matching a , or to the position where it was at the start of matching first a , if a matched zero times. Then the generated parser will return success to the parent component parser. Thus the generated parser will match a zero or more times.

Example

// a 'kleene':
...
parser ExpressionParser
{
    ...
    LogicalOrExpression(var std::unique_ptr<soul::ast::cpp::Node> expr) : soul::ast::cpp::Node*
        ::= 
        (   LogicalAndExpression:left{ expr.reset(left); }
            (
                DISJUNCTION LogicalAndExpression:right!{ ... }
            )*
        )
        {
            return expr.release();
        }
        ;
    ...
}

For a 'positive' parser a + , the generated component parser will try to match a as many times as it matches. If a matched at least once, the generated parser will backtrack the input to the position where it was at the end of last matching a and return success to the parent component parser. Otherwise the generated parser will backtrack to the position where it was at the start of matching first a and return failure to the parent component parser. Thus the generated parser will match a one or more times.

Example

// a 'positive':

parser DeclarationParser
{
    ...
    DeclSpecifierSeq(soul::ast::cpp::SimpleDeclarationNode* declaration)
        ::= 
        (
            (
                DeclSpecifier:declSpecifier{ declaration->Add(declSpecifier); }
            )+
            |   TypeName:typeName{ declaration->Add(typeName); }
        )
        ;
    ...
}

For an 'optional' parser a ? , the generated component parser will try to match a . If a matches, the generated parser returns success to the parent component parser. Otherwise the generated parser will backtrack to the position where it was at the start of matching a and return success to the parent component parser. Thus the generated parser will match a zero or one times.

Example

// an 'optional':
...
parser StatementParser
{
    ...
    ReturnStatement : soul::ast::cpp::StatementNode*
        ::= 
        (
            RETURN Expression:returnValue? SEMICOLON!
        )
        { 
            return new soul::ast::cpp::ReturnStatementNode(lexer.GetSourcePos(pos), returnValue); 
        }
        ;
    ...
}

4.2.7 Compound

compound primary expectation ? ( success‑action failure‑action ? )?
expectation !
success‑action compound‑statement
failure‑action / compound‑statement

A 'compound' parsing expression consists of a 'primary' parsing expression optionally followed by an 'expectation' symbol optionally followed by a 'success-action' that is optionally followed by a 'failure-action'.

A 'primary' parsing expression followed by an 'expectation' symbol forms an 'expectation' parsing expression. For an expectation parser a ! , the generated component parser tries to match a . If a matches the component parser returns success to the parent component parser. Otherwise the generated parser will throw an exception that contains the source location of the current input position and information about the current token or rule that is expected to match.

A 'success-action' is a C++ compound statement that is executed when input matches its preceding parsing expression, either a 'primary' parsing expression or an 'expectation' parsing expression. If the current rule returns a value, the success action typically returns a parsed information item.

Example

// a 'success' action:

parser StatementParser
{
    ...
    BreakStatement : soul::ast::cpp::StatementNode*
        ::= 
        (
            BREAK SEMICOLON!
        )
        { 
            return new soul::ast::cpp::BreakStatementNode(lexer.GetSourcePos(pos)); 
        }
        ;
    ...
}

A 'failure-action' consists of / symbol followed by a C++ compound statement that is executed when input does not match its preceding parsing expression. If the preceding semantic actions of the current rule have changed some global state, the failure action may restore that state, for example.

Example

// a 'failure' action:

parser ConceptParser
{
    ...
    ConceptDefinition(sngcpp::symbols::Context* ctx, var SourcePos s) : Node*
        ::= 
        (
            ...
            Assign:assign{ ctx->PushSetFlag(sngcpp::symbols::ContextFlags::parsingConceptDefinition); }
            ConstraintExpression(ctx):constraintExpr{ ctx->PopFlags(); } / { ctx->PopFlags(); }
            ...
        )
        ;
    ...
}

Variables for Semantic Actions

There are special variables available for use in semantic actions:

The 'lexer' variable

Current lexer can be accessed using the 'lexer' variable.

The 'pos' variable

The 'pos' is a 64-bit integer variable that contains the lexer position of the current token. The leftmost 32-bits contain the line number of the token and the rightmost 32-bits contain an index to a vector of tokens inside the lexer.

You can access the current lexer token by using expression lexer . GetToken ( pos ) . For example, to convert the matching lexeme of the current token to a UTF-8 string, you can use expression util :: ToUtf8 ( lexer . GetToken ( pos ). ToString () ) .

Example

// 'pos':
...
parser XPathParser
{
    ...
    NCName : std::string
        ::= NAME{ return util::ToUtf8(lexer.GetToken(pos).ToString()); }
        ;
    ...
}

The 'pass' variable

'pass' is a Boolean variable. By setting the 'pass' variable to false in a semantic action the semantic action can conditionally reject the preceding parsing expression and cause to parser to backtrack and try the next matching choice. In that case the parser behaves like the parsing expression preceding the semantic action has not matched although it actually has. This is useful for implementing context-dependent keywords for example. A context-dependent keyword behaves like a keyword only in certain positions of the syntax, in other positions it can be used as an ordinary name.

Example

// 'pass':

parser XPathParser
{
    ...
    OrExpr(var std::unique_ptr<soul::xml::xpath::expr::Expr> expr) : soul::xml::xpath::expr::Expr*
        ::= 
        (
            AndExpr:left{ expr.reset(left); }
            (
                OrKeyword:or_ AndExpr:right{ ... }
            )*
        )
        {
            return expr.release();
        }
        ;
    ...
    OrKeyword 
        ::= NAME{ pass = lexer.GetKeywordToken(lexer.GetToken(pos).match) == OR; }
        ;
    ...
}

The 'vars' variable

The 'vars' variable is a pointer variable that can be used in a semantic action to access lexer variables . Lexer variable var can be accessed by using expression vars ‑> var .

Example

// 'vars':
...
parser DeclarationParser
{
    ...
    TypeName(var std::unique_ptr<soul::ast::cpp::TypeNameNode> typeName) : soul::ast::cpp::TypeNameNode*
        ::=
        (
            QualifiedCppId:qid{ typeName.reset(new soul::ast::cpp::TypeNameNode(lexer.GetSourcePos(pos), qid)); }
            (
                LANGLE{ ++vars->leftAngleCount; typeName->SetTemplate(); } 
                TemplateArgumentList(typeName.get()):args 
                RANGLE{ --vars->leftAngleCount; }
            )?
        )
        {
            return typeName.release();
        }
        ;
    ...
}

4.2.8 Primary

primary nonterminal | primitive | group

A 'primary' parsing expression consists either of a 'nonterminal', a 'primitive' or a 'group' parsing expression.

Nonterminal

nonterminal rule‑name argument‑list ? instance‑name
argument‑list ( expression‑list )
instance‑name : identifier

A 'nonterminal' parsing expression consists of a rule name optionally followed by an argument list followed by a nonterminal instance name.

A nonterminal instance name consists of the : symbol and an identifier that must be unique within a parsing rule.

The : symbol is mandatory and separates a 'nonterminal' parsing expression from a ' token-parser ' parsing expression that also starts with an identifier.

An argument list consists of C++ expressions separated by commas and enclosed in parentheses.

Note: if an argument list is present, there may not be white space between the rule name identifier and the left parenthesis of the argument list. Otherwise the 'nonterminal' parsing expression would conflict with the ' group ' parsing expression.

The generated component parser calls a rule function with the given arguments. The number of arguments must match the number of parameters the rule takes. If the rule returns success, the instance name is bound to the return value of the called rule if any and the generated component parser returns success to its parent component parser. Otherwise the generated component parser returns failure to its parent component parser.

Example

// 'nonterminal's:

parser ParserFileParser
{
    ...
    ParserStatement(soul::ast::spg::GrammarParser* parser)
        ::= LexerStatement(parser):lexerStatement
        |   MainStatement(parser):mainStatement
        |   UsingStatement(parser):usingStatement
        |   RuleStatement(parser):ruleStatement
        ;
    ...
}

Primitive

primitive empty | any | token‑parser | lexerless‑parser
token‑parser token‑name |
token‑name identifier

A 'primitive' parsing expression consists either of an 'empty' parsing expression, an 'any' parsing expression, a 'token-parser' parsing expression or a 'lexerless-parser' parsing expression.

An 'empty' parsing expression consists of a keyword empty . It matches always and does not consume any input.

Example

// 'empty':

parser ParserFileParser
{
    ...
    ParserFile(var std::unique_ptr<soul::ast::spg::ParserFile> parserFile) : soul::ast::spg::ParserFile*
        ::= 
        (
            empty{ parserFile.reset(new soul::ast::spg::ParserFile(lexer.FileName())); }
            ...
        )
        ;
    ...
}

An 'any' parsing expression consists of a keyword any . It matches a nonempty input and consumes, "eats", any single input token.

Example

// 'any':

parser DeclarationParser
{
    ...
    ParenthesizedTokens
        ::= LPAREN ((any - (LPAREN | RPAREN)) | ParenthesizedTokens:ptokens)* RPAREN
        ;
}

A 'token-parser' parsing expression consists of a token name. If the ID of the current input token of the lexer matches the ID of the token of the 'token‑parser', the parser consumes the token and advances the input position of the lexer.

Example

// tokens:
...
parser LiteralParser
{
    ...
    Literal : soul::ast::cpp::LiteralNode*
        ::= INTEGER_LITERAL{ return new soul::ast::cpp::LiteralNode(lexer.GetSourcePos(pos), util::ToUtf8(lexer.GetToken(pos).ToString())); }
        |   FLOATING_LITERAL{ return new soul::ast::cpp::LiteralNode(lexer.GetSourcePos(pos), util::ToUtf8(lexer.GetToken(pos).ToString())); }
        |   CHAR_LITERAL{ return new soul::ast::cpp::LiteralNode(lexer.GetSourcePos(pos), util::ToUtf8(lexer.GetToken(pos).ToString())); }
        |   STRING_LITERAL{ return new soul::ast::cpp::LiteralNode(lexer.GetSourcePos(pos), util::ToUtf8(lexer.GetToken(pos).ToString())); }
        |   TRUE{ return new soul::ast::cpp::LiteralNode(lexer.GetSourcePos(pos), util::ToUtf8(lexer.GetToken(pos).ToString())); }
        |   FALSE { return new soul::ast::cpp::LiteralNode(lexer.GetSourcePos(pos), util::ToUtf8(lexer.GetToken(pos).ToString())); }
        |   NULLPTR{ return new soul::ast::cpp::LiteralNode(lexer.GetSourcePos(pos), util::ToUtf8(lexer.GetToken(pos).ToString())); }
        ;
    ...
}

The 'lexerless-parser' parsing expression is applicable when the lexer that the parser has been instantiated with is soul::lexer::trivial::TrivialLexer<Char> for some Char type.

lexerless‑parser character‑parser | string‑or‑character‑class‑parser
character‑parser char‑literal
string‑or‑character‑class‑parser string‑literal

The 'lexerless-parser' parsing expression consists of either of a 'character-parser' parsing expression or a 'string-or-character-class-parser' parsing expression.

The 'character-parser' parsing expression consists of a character literal. If the current input character of the lexer matches the character literal, the parser consumes the character and advances the input position of the lexer.

The 'string-or-character-class-parser' parsing expression consists of a string literal. If the string literal contains a character class enclosed in square brackets similar to the class in a regular expression, the parsing expression forms a 'character-class-parser' parsing expression, otherwise the forms a 'string-parser' parsing expression.

The 'character-class-parser' parsing expression matches the current input character of the lexer to the character set constructed from the character class. If the current input character matches, the parser consumes the character and advances the input position of the lexer.

The 'string-parser' parsing expression matches a string of input characters starting with the current input character of the lexer to the string literal of the string parser. If the string matches, the parser consumes the characters and advances the input position of the lexer by the length of the string literal.

Group

group ( choice )

The 'group' parsing expression consists of a ( symbol followed by a 'choice' parsing expression followed by a ) symbol. The evaluation precedence can be changed by using the 'group' parsing expression.

For a group parser ( a ) , the generated component parser tries to match a . If a matches, the generated component parser returns success to its parent component parser. Otherwise the input is backtracked to the position where the lexer was where starting to match a and returns failure to its parent component parser.

Example

// 'group':

parser RexParser
{
    ...
    Repetition(soul::rex::context::Context* context, var soul::rex::nfa::Nfa value) : soul::rex::nfa::Nfa
        ::=
        (
            Primary(context):left{ value = left; }
            (   STAR{ value = soul::rex::nfa::Kleene(*context, value); }
            |   PLUS{ value = soul::rex::nfa::Pos(*context, value); }
            |   QUEST{ value = soul::rex::nfa::Opt(*context, value); }
            )?
        )
        {
            return value;
        }
        ;
    ...
}

4.3 Project File Syntax

parser‑project‑file project parser‑project‑name ; parser‑project‑file‑declaration *
parser‑project‑name qualified‑id
parser‑project‑file‑declaration parser‑file‑declaration
parser‑file‑declaration extern ? parser file‑path ;

A parser project file consists of the keyword project followed by a parser project name followed by a semicolon followed by a sequence of parser project file declarations.

A parser project file declaration consists of an optional extern keyword followed by the keyword parser followed by a file path followed by a semicolon.

If parser file is declared extern , it participates to the parser file linking process but no C++ code is generated for it. An external parser file is expected to be included nonexternally in another parser project that is compiled separately.

Parser project file extension is .spg

Example

project example.parser;
parser <example_parser.parser>;