Communication Between Lexer and Parser

Communication between lexer and parser goes through lexer variables.

Lexer Variables

Lexer variables are defined within a variables declaration that is contained by a lexer declaration in a .lexer file:

        lexer ExampleLexer
        {
            variables
            {
                bool foo;
                int bar;
                double baz;
            }
        }
    

A variables declaration consists of the keyword variables followed by C++ variable declarations inside braces.

The lexer variables allow communication from a lexer to a parser, or from a parser to a lexer.

Communication from a Lexer to a Parser

A lexer may set a variable in a semantic action before returning a token identifier:

        // lexervars.lexer:

        #include <TokenValueParsers.hpp>

        classmap ExampleClassMap;

        tokens ExampleTokens
        {
            (INTLIT, "integer literal")
        }

        expressions
        {
            intlit = "[0-9]+";
        }

        lexer ExampleLexer
        {
            "{intlit}" { integerValue = ParseIntegerLiteral(token.match); return INTLIT; }

            variables
            {
                int integerValue;
            }
        }
    

Then a parser may use the variable in a semantic action of a parsing rule:

        #include <ExampleLexer.hpp>
        #include <ExampleTokens.hpp>

        using namespace ExampleTokens;

        parser ExampleParser
        {
            uselexer ExampleLexer;
            main;

            IntegerLiteral : int
                ::= INTLIT{ return lexer.integerValue; }
                ;
        }
    

Communication from a Parser to a Lexer

A parser may also set a variable in a semantic action of a parsing rule to be read by the lexer:

        #include <IncludeLexer.hpp>
        #include <IncludeTokens.hpp>

        using namespace IncludeTokens;

        parser IncludeParser
        {
            uselexer IncludeLexer;
            main;

            Choice
                ::= IncludeDirective:includeDirective
                |   Expression:expression
                ;

            IncludeDirective
                ::= empty{ lexer.parsingIncludeDirective = true; }
                    (HASH INCLUDE FILEPATH){ lexer.parsingIncludeDirective = false; } / { lexer.parsingIncludeDirective = false; }
                ;

            Expression
                ::= ID LANGLE ID
                ;
        }
    

The parser sets a lexer flag variable parsingIncludeDirective when it starts to parse an include directive, and resets the flag when it has ended parsing the include directive, whether it matched or not.

Conditional Actions

Then the lexer may return different sequence of tokens to the parser when parsing an include directive:

        classmap IncludeClassMap;

        tokens IncludeTokens
        {
            (FILEPATH, "file path"), (HASH, "'#'"), (LANGLE, "'<"), (RANGLE, "'>'"), (INCLUDE, "'include'"), (ID, "identifier")
        }

        keywords IncludeKeywords
        {
            ("include", INCLUDE)
        }

        expressions
        {
            ws = "[\n\r\t ]";
            separators = "{ws}+";
            id = "{idstart}{idcont}*";
            filePath = "<[^\n>]+>";
        }

        lexer IncludeLexer
        {
            "{separators}" {}
            "{id}" { int kw = GetKeywordToken(token.match); if (kw == INVALID_TOKEN) return ID; else return kw; }
            "#" { return HASH; }
            "<"{ return LANGLE; }
            ">"{ return RANGLE; }
            "{filePath}" $(1) { return FILEPATH; }

            variables
            {
                bool parsingIncludeDirective;
            }

            actions
            {
                $(1) = { if (!parsingIncludeDirective) return INVALID_TOKEN; }
            }
        }
    

The lexer has a Boolean variable parsingIncludeDirective, that is set by the parser. The lexer has a conditional action that checks the parsingIncludeDirective flag. By returning INVALID_TOKEN from a conditional action, the lexer rejects the current token and returns the token that has matched before that token.

When a filePath pattern is matched, the lexer checks whether parsing an include directive. If that is the case, it returns the FILEPATH token, otherwise it returns a token that matched before. Because filePath pattern starts with a left angle bracket, LANGLE, the token that matched before is the LANGLE token.

This means that when parsing string #include <file.hpp>, the lexer returns token sequence HASH INCLUDE FILEPATH to the parser, but when parsing string a < b the lexer returns token sequence ID LANGLE ID to the parser.

This does not work in all situations because the lexer will save scanned tokens, and return the same sequence of tokens to the parser if it needs to backtrack. You may not be able to have a keyword in some context and not in another context, for example.

A conditional action is declared in an conditional-actions declaration within a lexer declaration. The action consists of the character '$' followed by the number of the action in parentheses followed by the assignment symbol '=' followed by a C++ compound statement. A conditional action is used by putting the '$' character followed by the number of the action in parentheses between a token pattern and a semantic lexer action within the lexer declaration.