Operation of the Parser Generator

The parser generator spg generates a source and a header file for each .parser file with the same base name as the .parser file. For each parser in a .parser file, the generator produces a C++ class with a static member function for each parsing rule of the parser. The first parameter of the static member function is of the type defined in the uselexer declaration of the parser. There's additional parameters for each inherited attribute of a parsing rule. Each static member function returns a soulng::parser::Match object. The Match object contains a Boolean member variable hit that is set to the value true if the rule matched, and to the value false otherwise. The Match object also contains a void* member variable value, that is used for returning the synthesized attribute of the rule, if any.

The body of a parsing rule consists of parsing expressions for which spg generates ordinary procedural code with nested scopes. Each scope contains an object of type Match with the name match and for many parsing expressions also a pointer variable of type Match* with the name parentMatchi that is set to point to the match object. The index i is a serial number starting from 0. The parentMatchi pointer variable is used for setting a match variable in an outer scope to the value of a match variable in an inner scope.

Debug Parser

To show the operation of spg, we have made a parser that has each kind of parsing expression in the body of its rules. Here's the skeleton of the DebugParser.parser:

        // DebugParser.parser

        #include <DebugLexer.hpp>
        #include <DebugTokens.hpp>
        #include <DebugAST.hpp>
        #include <iostream>

        using namespace DebugTokens;

        parser DebugParser
        {
            uselexer DebugLexer;
            main;

            Token
                ::= ID
                ;

            // ...
        }
    

Parse Function

The first rule of the parser is a rule called Token. Since the DebugParser has a main declaration, spg generates a Parse function that calls the Token rule:

        void DebugParser::Parse(DebugLexer& lexer)
        {
            ++lexer;
            int pos = lexer.GetPos();
            soulng::parser::Match match = DebugParser::Token(lexer);
            if (match.hit)
            {
                if (*lexer == soulng::lexer::END)
                {
                    return;
                }
                else
                {
                    lexer.ThrowExpectationFailure(lexer.GetPos(), ToUtf32(GetTokenInfo(soulng::lexer::END)));
                }
            }
            else
            {
                lexer.ThrowExpectationFailure(pos, U"Token");
            }
            return;
        }
    

The Parse function first advances the lexer to the first token by calling ++lexer. Then it calls the Token member function with the lexer argument. If the Token rule matched, and the whole input has been consumed, the function just returns, because the Token rule has no synthesized attribute to return. The whole input has been consumed, if the current token after parsing the Token rule is soulng::lexer::END token. If the whole input has not been consumed, the parser throws std::runtime_error with message "end of input expected".

If the Token rule did not match, the parser throws std::runtime_error with message "Token expected". We have could have provided a different informative string for the Token rule by using the ruleinfo declaration.

Code Generated for Parsing Expressions

We now look at the C++ code generated for each kind of parsing expression.

Token Expression

For a token-expression with ID as a token-id:

        // ...

        Token
            ::= ID
            ;

        // ...

    

spg generates the following code:

        soulng::parser::Match DebugParser::Token(DebugLexer& lexer)
        {
            soulng::parser::Match match(false);
            if (*lexer == ID)
            {
                ++lexer;
                match.hit = true;
            }
            return match;
        }

    

First a match object with the hit member set to false is constructed. Then if the current lexer token is ID, the lexer is advanced to the next token and the hit member is changed to true, otherwise the lexer is not advanced, and the hit member stays false. In any case the parser returns the match object.

Rule-Call Expressions

For a rule-call expression with rule-name as Token and unique name as token:

        // ...

        RuleCall1
            ::= Token:token
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::RuleCall1(DebugLexer& lexer)
        {
            soulng::parser::Match match = DebugParser::Token(lexer);
            return match;
        }
    

Since the Token rule has no synthesized attribute, the unique name token is not used. The parser simply calls the Token rule with the lexer argument, and returns the match object returned by the Token rule.

For a rule-call expression with rule-name as Synthesized, that returns a value type, and unique name set as synthesized:

        // ...

        Synthesized : int
            ::= empty{ return 1; }
            ;

        RuleCall2 : int
            ::=  Synthesized:synthesized{ return synthesized; }
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::Synthesized(DebugLexer& lexer)
        {
            soulng::parser::Match match(false);
            soulng::parser::Match* parentMatch0 = &match;
            {
                int pos = lexer.GetPos();
                soulng::parser::Match match(true);
                if (match.hit)
                {
                    return soulng::parser::Match(true, new soulng::parser::Value<int>(1));
                }
                *parentMatch0 = match;
            }
            return match;
        }

        soulng::parser::Match DebugParser::RuleCall2(DebugLexer& lexer)
        {
            std::unique_ptr<soulng::parser::Value<int>> synthesized;
            soulng::parser::Match match(false);
            soulng::parser::Match* parentMatch0 = &match;
            {
                int pos = lexer.GetPos();
                soulng::parser::Match match = DebugParser::Synthesized(lexer);
                synthesized.reset(static_cast<soulng::parser::Value<int>*>(match.value));
                if (match.hit)
                {
                    return soulng::parser::Match(true, new soulng::parser::Value<int>(synthesized->value));
                }
                *parentMatch0 = match;
            }
            return match;
        }
    

Since the called Synthesized rule has a synthesized attribute, a local variable for it is generated in the RuleCall2 function. The name of the local variable, synthesized, is the unique name given to the synthesized attribute in the call of the Synthesized rule. If the Synthesized rule matches the parser returns a Match object with hit member set true, and a the value member set as the value returned by the Synthesized rule.

For a rule-call expression with rule-name as PtrSynthesized, that returns a pointer type, and unique name set as ptrSynthesized:

        // ...

        PtrSynthesized : Node*
            ::= empty{ return new Node(); }
            ;

        RuleCall3 : Node*
            ::= PtrSynthesized:ptrSynthesized{ return ptrSynthesized; }
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::PtrSynthesized(DebugLexer& lexer)
        {
            soulng::parser::Match match(false);
            soulng::parser::Match* parentMatch0 = &match;
            {
                int pos = lexer.GetPos();
                soulng::parser::Match match(true);
                if (match.hit)
                {
                    return soulng::parser::Match(true, new Node);
                }
                *parentMatch0 = match;
            }
            return match;
        }

        soulng::parser::Match DebugParser::RuleCall3(DebugLexer& lexer)
        {
            std::unique_ptr<Node> ptrSynthesized;
            soulng::parser::Match match(false);
            soulng::parser::Match* parentMatch0 = &match;
            {
                int pos = lexer.GetPos();
                soulng::parser::Match match = DebugParser::PtrSynthesized(lexer);
                ptrSynthesized.reset(static_cast<Node*>(match.value));
                if (match.hit)
                {
                    return soulng::parser::Match(true, ptrSynthesized.release());
                }
                *parentMatch0 = match;
            }
            return match;
        }

    

This time, since the synthesized attribute is of a pointer type, the local variable, ptrSynthesized, holding a unique pointer to the synthesized attribute of the PtrSynthesized rule, is reset after a call to PtrSynthesized and released when returned from the RuleCall3 function.

Alternative Expression

For an alternative expression:

        A   ::= Token:token
            ;

        B   ::= Token:token
            ;

    Alternative
        ::= A:a | B:b
        ;
    

spg generates the following code:

        soulng::parser::Match DebugParser::A(DebugLexer& lexer)
        {
            soulng::parser::Match match = DebugParser::Token(lexer);
            return match;
        }

        soulng::parser::Match DebugParser::B(DebugLexer& lexer)
        {
            soulng::parser::Match match = DebugParser::Token(lexer);
            return match;
        }

        soulng::parser::Match DebugParser::Alternative(DebugLexer& lexer)
        {
            soulng::parser::Match match(false);
            soulng::parser::Match* parentMatch0 = &match;
            {
                int save = lexer.GetPos();
                soulng::parser::Match match = DebugParser::A(lexer);
                *parentMatch0 = match;
                if (!match.hit)
                {
                    soulng::parser::Match match(false);
                    soulng::parser::Match* parentMatch1 = &match;
                    lexer.SetPos(save);
                    {
                        soulng::parser::Match match = DebugParser::B(lexer);
                        *parentMatch1 = match;
                    }
                    *parentMatch0 = match;
                }
            }
            return match;
        }
    

First an outer match object with value false is constructed. Then the parser saves the current position of the lexer to an integer variable save. Then the first alternative, A, is parsed and the result is set to the outer match object. If A matches the outer match object is returned. Otherwise the parser backtracks the input to the saved position by calling lexer.SetPos with save, parses the second alternative B, sets the result to the outer match object, and returns the result.

Sequence Expression

For a sequence expression:

        // ...

        Sequence
            ::= A:a B:b
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::Sequence(DebugLexer& lexer)
        {
            soulng::parser::Match match(false);
            soulng::parser::Match* parentMatch0 = &match;
            {
                soulng::parser::Match match = DebugParser::A(lexer);
                *parentMatch0 = match;
            }
            if (match.hit)
            {
                soulng::parser::Match match(false);
                soulng::parser::Match* parentMatch1 = &match;
                {
                    soulng::parser::Match match = DebugParser::B(lexer);
                    *parentMatch1 = match;
                }
                *parentMatch0 = match;
            }
            return match;
        }
    

First an outer match object with value false is constructed. Then the first member of the sequence, A, is parsed and the result is set to the outer match object. If A matches the second member of the sequence, B, is parsed and the result is set to the outer match object. In any case the outer match object that contains the result is returned.

Difference Expression

For a difference expression:

        // ...

        Difference
            ::= A:a - B:b
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::Difference(DebugLexer& lexer)
        {
            soulng::parser::Match match(false);
            soulng::parser::Match* parentMatch0 = &match;
            int save = lexer.GetPos();
            {
                soulng::parser::Match match = DebugParser::A(lexer);
                *parentMatch0 = match;
            }
            if (match.hit)
            {
                soulng::parser::Match match(false);
                soulng::parser::Match* parentMatch1 = &match;
                {
                    int tmp = lexer.GetPos();
                    lexer.SetPos(save);
                    save = tmp;
                    soulng::parser::Match match = DebugParser::B(lexer);
                    *parentMatch1 = match;
                }
                if (!match.hit)
                {
                    lexer.SetPos(save);
                }
                *parentMatch0 = soulng::parser::Match(!match.hit, match.value);
            }
            return match;
        }
    

First an outer match object with value false is constructed. Then the parser saves the current position of the lexer to an integer variable save. Then the first member of the difference, A, is parsed and the result is set to the outer match object. If A matches, the current end position after parsing A is saved to an integer variable tmp, the input is backtracked to the previously saved position by calling lexer.SetPos with save, then the tmp value is set to the save value. Now the second member of the difference, B, is parsed. If B did not match, the input position is set to the position where it was after parsing A, and a successful match object is set to the outer match object. Otherwise, if B matched, an unsuccessful match object is set to the outer match object. In any case the outer match object that contains the result is returned.

List Expression

For a list expression:

        // ...

        List
            ::= A:a % B:b
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::List(DebugLexer& lexer)
        {
            soulng::parser::Match match(false);
            soulng::parser::Match* parentMatch0 = &match;
            {
                soulng::parser::Match match = DebugParser::A(lexer);
                *parentMatch0 = match;
            }
            if (match.hit)
            {
                soulng::parser::Match match(false);
                soulng::parser::Match* parentMatch1 = &match;
                {
                    soulng::parser::Match match(true);
                    soulng::parser::Match* parentMatch2 = &match;
                    {
                        while (true)
                        {
                            int save = lexer.GetPos();
                            {
                                soulng::parser::Match match(false);
                                soulng::parser::Match* parentMatch3 = &match;
                                {
                                    soulng::parser::Match match = DebugParser::B(lexer);
                                    *parentMatch3 = match;
                                }
                                if (match.hit)
                                {
                                    soulng::parser::Match match(false);
                                    soulng::parser::Match* parentMatch4 = &match;
                                    {
                                        soulng::parser::Match match = DebugParser::A(lexer);
                                        *parentMatch4 = match;
                                    }
                                    *parentMatch3 = match;
                                }
                                if (match.hit)
                                {
                                    *parentMatch2 = match;
                                }
                                else
                                {
                                    lexer.SetPos(save);
                                    break;
                                }
                            }
                        }
                    }
                    *parentMatch1 = match;
                }
                *parentMatch0 = match;
            }
            return match;
        }
    

The list expression A % B is an abbreviation for an expression A (B A)*. The generated code is for the longer expression.

Closure Expression

For a closure expression:

        // ...

        Closure
            ::= A:a*
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::Closure(DebugLexer& lexer)
        {
            soulng::parser::Match match(true);
            soulng::parser::Match* parentMatch0 = &match;
            {
                while (true)
                {
                    int save = lexer.GetPos();
                    {
                        soulng::parser::Match match = DebugParser::A(lexer);
                        if (match.hit)
                        {
                            *parentMatch0 = match;
                        }
                        else
                        {
                            lexer.SetPos(save);
                            break;
                        }
                    }
                }
            }
            return match;
        }
    

First an outer match object with value true is constructed. The parser then saves the last successful input position to the integer variable save. Then A is parsed. If A matched, the result is set to the outer match object. Otherwise the input is backtracked to the last successfully parsed input position in the variable save. In any case the outer match object with a successful result is returned.

Positive Closure Expression

For a positive closure expression:

        // ...

        Positive
            ::= A:a+
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::Positive(DebugLexer& lexer)
        {
            soulng::parser::Match match(false);
            soulng::parser::Match* parentMatch0 = &match;
            {
                soulng::parser::Match match = DebugParser::A(lexer);
                *parentMatch0 = match;
            }
            if (match.hit)
            {
                soulng::parser::Match match(true);
                soulng::parser::Match* parentMatch1 = &match;
                while (true)
                {
                    int save = lexer.GetPos();
                    {
                        soulng::parser::Match match = DebugParser::A(lexer);
                        if (match.hit)
                        {
                            *parentMatch1 = match;
                        }
                        else
                        {
                            lexer.SetPos(save);
                            break;
                        }
                    }
                }
            }
            return match;
        }
    

First an outer match object with value false is constructed. Then the first occurrence of A is parsed, and the result is set to the outer match object. If the first occurrence matched, then the generated code is the same as for the A* expression. In any case the outer match object with the result is returned.

Optional Expression

For a optional expression:

        // ...

        Optional
            ::= A:a?
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::Optional(DebugLexer& lexer)
        {
            soulng::parser::Match match(true);
            int save = lexer.GetPos();
            soulng::parser::Match* parentMatch0 = &match;
            {
                soulng::parser::Match match = DebugParser::A(lexer);
                if (match.hit)
                {
                    *parentMatch0 = match;
                }
                else
                {
                    lexer.SetPos(save);
                }
            }
            return match;
        }
    

First an outer match object with value true is constructed. Then the current input position is saved to an integer variable save. Then A is parsed. If A matched, the result is set to the outer match object. Otherwise the parser backtracks the input to the saved position. In any case the outer match object with a successful result is returned.

Empty Expression

For an empty expression:

        // ...

        Empty
            ::= empty
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::Empty(DebugLexer& lexer)
        {
            soulng::parser::Match match(true);
            return match;
        }
    

The parser constructs and returns a successful match object.

Grouping Expression

For a grouping expression:

        // ...

        Grouping
            ::= (A:a B:b)
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::Grouping(DebugLexer& lexer)
        {
            soulng::parser::Match match(false);
            soulng::parser::Match* parentMatch0 = &match;
            {
                soulng::parser::Match match = DebugParser::A(lexer);
                *parentMatch0 = match;
            }
            if (match.hit)
            {
                soulng::parser::Match match(false);
                soulng::parser::Match* parentMatch1 = &match;
                {
                    soulng::parser::Match match = DebugParser::B(lexer);
                    *parentMatch1 = match;
                }
                *parentMatch0 = match;
            }
            return match;
        }
    

The code is the same as for the contained expression.

Expectation Expression

For a expectation expression:

        // ...

        Expectation
            ::= A:a!
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::Expectation(DebugLexer& lexer)
        {
            soulng::parser::Match match(true);
            soulng::parser::Match* parentMatch0 = &match;
            {
                int pos = lexer.GetPos();
                soulng::parser::Match match = DebugParser::A(lexer);
                if (match.hit)
                {
                    *parentMatch0 = match;
                }
                else
                {
                    lexer.ThrowExpectationFailure(pos, U"A");
                }
            }
            return match;
        }
    

First an outer match object with value true is constructed. The parser then parses A. If A matches, the result is set to the outer match object, and the successful result is returned. Otherwise the parser throws a std::runtime_error exception with message "A expected".

Semantic Actions

For a action expression:

        // ...

        Action
            ::= A:a{ std::cout << "A" << std::endl; } / { std::cout << "~A" << std::endl; }
            ;

        // ...
    

spg generates the following code:

        soulng::parser::Match DebugParser::Action(DebugLexer& lexer)
        {
            soulng::parser::Match match(false);
            soulng::parser::Match* parentMatch0 = &match;
            {
                int pos = lexer.GetPos();
                soulng::parser::Match match = DebugParser::A(lexer);
                if (match.hit)
                {
                    std::cout << "A" << std::endl;
                }
                else
                {
                    std::cout << "~A" << std::endl;
                }
                *parentMatch0 = match;
            }
            return match;
        }
    

First an outer match object with value false is constructed. Then the parser parses A. If A matched the semantic action for a successful match is executed. Otherwise the semantic action for an unsuccessful match, if any, is executed. In any case the result of parsing A is set to the outer match object and returned.