up: Table of contents | prev: Generating Parsing Errors | next: Visitor

3.5 Abstract Syntax Tree

We want to construct a tree of nodes that represents the structure of a parsed .minilang source file. By walking the tree we may for example pretty-print a minilang program, type check it, or compile it to machine code. We are not interested in the exact details of the syntax such as white space or punctuation symbols, but want just nodes that represent the structure of a minilang program. This kind of tree is called an abstract syntax tree. It is abstract because it does not represent the full details of the syntax as a parse tree does.

The strategy is to build the tree of nodes starting from the primitive syntax rules such as rules for a type and literal and working towards to the top source-file rule. For that we must be able to return information from a parsing rule to its parent rule. An attribute for a rule is required:

Attributes of Parsing Rules

We may associate two kinds of attributes to parsing rules: synthesized attributes and inherited attributes. A synthesized attribute returns information from a parsing rule to its parent parsing rule. An inherited attribute delivers information from a parent parsing rule to its child parsing rule.

We may think that a synthesized attribute is like a return value of a parsing rule and that an inherited attribute is like a parameter of a parsing rule. In fact parsing rules of a recursive descent parser are implemented as functions: The synthesized attribute of a parsing rule is included in the return value of the function that implements the parsing rule and the inherited attributes of a parsing rule become parameters of the function that implements the parsing rule.

The syntax for representing a synthesized attribute is:

        parser Example
        {
            Rule : T
        }
    

So the type T of the synthesized attribute of rule Rule is represented as a colon and a C++ type.

The syntax for representing an inherited attribute is:

        parser Example
        {
            Rule(T name)
        }
    

So the type and name of an inherited attribute of rule Rule is represented as a C++ type T and a name name within parentheses. If a parsing rule has many inherited attributes, they are separated by commas.

Node

Now it becomes clear that we want a synhesized attribute that returns a syntax tree node from a parsing rule to its parent. Let's call that attribute a Node.

We have added two source files to the examples/minilang directory and added them to the minilang project: The Tree.hpp file will contain the interface of the abstract syntax tree classes and Tree.cpp will contain their implementation.

We have created a class called Node that represents the base class of all syntax tree node classes. Here's the interface of the Node in Tree.hpp:

        namespace minilang {

        class Node
        {
        public:
            virtual ~Node();
        };

        } // namespace minilang
    

The destructor is declared virtual because it will be the root class of a class hierarchy. Here's the implementation of Node in Tree.cpp:

        #include <minilang/Tree.hpp>

        namespace minilang {

        Node::~Node()
        {
        }

        } // namespace minilang
    

Types

We start by creating syntax tree node classes for minilang types. The node classes are called IntNode, BoolNode and VoidNode and defined in Tree.hpp:

        // ...

        class IntNode : public Node
        {
        };

        class BoolNode : public Node
        {
        };

        class VoidNode : public Node
        {
        };
    

The current contents of the TypeParser.parser is as follows:

        #include <minilang/MinilangLexer.hpp>
        #include <minilang/MinilangTokens.hpp>

        using namespace MinilangTokens;

        parser TypeParser
        {
            uselexer MinilangLexer;

            Type
                ::= INT
                |   BOOL
                |   VOID
                ;
        }
    

Now we add a synthesized attribute, or a return value, to the Type rule. The type of the synthesized attribute is minilang::Node*:

        #include <minilang/MinilangLexer.hpp>
        #include <minilang/MinilangTokens.hpp>
        #include <minilang/Tree.hpp>

        using namespace MinilangTokens;

        parser TypeParser
        {
            uselexer MinilangLexer;

            Type : minilang::Node*
                ::= INT{ return new minilang::IntNode(); }
                |   BOOL{ return new minilang::BoolNode(); }
                |   VOID{ return new minilang::VoidNode(); }
                ;
        }
    

We have also attached a semantic action to the INT, BOOL and VOID tokens. A semantic action is a compound C++ statement that is bound to a syntax element. The semantic action gets executed when the input matches its preceding syntax element. The semantic actions that are bound to the INT, BOOL and VOID tokens create and return the corresponding syntax tree nodes from the Type rule.

At this point we may generate code with the spg tool:

        C:\soulng-1.0.0\examples\minilang>spg MinilangParsers.spg
    

But when we then try to build the project, the Visual Studio generates errors that point to the FunctionParser.cpp and StatementParser.cpp:

        'minilang': is not a class or namespace name	
    

We need to add an #include directive for types defined in Tree.hpp to the FunctionParser.parser and to the StatementParser.parser:

        // FunctionParser.parser:
        // ...
        #include <minilang/Tree.hpp>

        // StatementParser.parser:
        // ...
        #include <minilang/Tree.hpp>
    

Then run the spg tool again:

        C:\soulng-1.0.0\examples\minilang>spg MinilangParsers.spg
    

Now compilation is expected to succeed.

Literals

Next we create node classes for literals in Tree.hpp

        // ...

        class BooleanLiteralNode : public Node
        {
        public:
            BooleanLiteralNode(bool value_);
            bool Value() const { return value; }
        private:
            bool value;
        };

        class IntegerLiteralNode : public Node
        {
        public:
            IntegerLiteralNode(int64_t value_);
            int64_t Value() const { return value; }
        private:
            int64_t value;
        };
    

Before changes the LiteralParser.parser has the following content:

        #include <minilang/MinilangLexer.hpp>
        #include <minilang/MinilangTokens.hpp>

        using namespace MinilangTokens;

        parser LiteralParser
        {
            uselexer MinilangLexer;

            Literal 
                ::= BooleanLiteral:booleanLiteral
                |   IntegerLiteral:integerLiteral
                ;

            BooleanLiteral 
                ::= TRUE
                |   FALSE
                ;

            IntegerLiteral 
                ::= INTLIT
                ;
        }
    

The body of the Literal rule consists of two alternatives that refer to the BooleanLiteral and IntegerLiteral rules. Those rules have unique names booleanLiteral and integerLiteral respectively. Now unique names should make sense: those names represent the synthesized attributes of the BooleanLiteral and IntegerLiteral rules within the body of the Literal rule. We may also think that the Literal rule "calls" the BooleanLiteral or the IntegerLiteral rule, and those rules "return" values that have been named booleanLiteral and integerLiteral. The semantic actions return those values:

        Literal : minilang::Node*
            ::= BooleanLiteral:booleanLiteral{ return booleanLiteral; }
            |   IntegerLiteral:integerLiteral{ return integerLiteral; }
            ;
    

The changed BooleanLiteral rule contains no new things:

        BooleanLiteral : minilang::Node*
            ::= TRUE{ return new minilang::BooleanLiteralNode(true); }
            |   FALSE{ return new minilang::BooleanLiteralNode(false); }
            ;
    

But the semantic action bound to the INTLIT token in the IntegerLiteral rule has many new things:

        IntegerLiteral : minilang::Node*
            ::= INTLIT{ return new minilang::IntegerLiteralNode(minilang::ParseIntegerLiteral(lexer.FileName(), lexer.GetToken(pos))); }
            ;
    

First: each semantic action has symbols lexer, pos and span available:

Second: we have created a ParseIntegerLiteral function that parses an integer value of a matching token. It is declared in TokenValueParser.hpp as:

        int64_t ParseIntegerLiteral(const std::string& fileName, const soulng::lexer::Token& token);
    

We have added TokenValueParsers.cpp and TokenValueParsers.hpp to the project and added an #include directives for Tree.hpp and TokenValueParsers.hpp to LiteralParser.parser. Now generate sources:

        C:\soulng-1.0.0\examples\minilang>spg MinilangParsers.spg
    

When we build, the Visual Studio reports error "'minilang': is not a class or namespace name" this time pointing to the Expression.cpp, so we add an #include to the ExpressionParser.parser:

        // ExpressionParser.parser:
        // ...
        #include <minilang/Tree.hpp>
    

Then generate sources:

        C:\soulng-1.0.0\examples\minilang>spg MinilangParsers.spg
    

Now the project is expected to build again.

Identifier

We obtain the matching token from the lexer by using the pos symbol, and construct an identifier node with a string that has matched the ID token:

        Identifier : minilang::Node*
            ::= ID{ soulng::lexer::Token token = lexer.GetToken(pos); return new minilang::IdentifierNode(token.match.ToString()); }
            ;

    

The soulng::lexer::Token class has a member variable named match that has begin and end pointers to the matching characters, and a member function ToString(), that returns the match as a std::u32string.

We add an #include for Tree.hpp to IdentifierParser.parser, generate sources with spg and build. No errors expected.

Expressions

Since the parsing rules of the ExpressionParser are mutually recursive, we need to implement changes to the entire ExpressionParser.parser file before the code will compile.

First we create an enumerated type for representing operators:

        // Tree.hpp:

        enum class Operator
        {
            equal, notEqual, less, greater, lessOrEqual, greaterOrEqual, add, sub, mul, div, mod, unaryPlus, unaryMinus, not_
        };

    

Next we extend our syntax tree node hierarchy with two classes: a UnaryNode that has one child node, and a BinaryNode that has two child nodes:

        // Tree.hpp:

        class UnaryNode : public Node
        {
        public:
            UnaryNode(Node* child_);
            Node* Child() const { return child.get(); }
        private:
            std::unique_ptr<Node> child;
        };

        class BinaryNode : public Node
        {
        public:
            BinaryNode(Node* left_, Node* right_);
            Node* Left() const { return left.get(); }
            Node* Right() const { return right.get(); }
        private:
            std::unique_ptr<Node> left;
            std::unique_ptr<Node> right;
        };
    

Expression Rule

Now we start with the Expression rule. Here's the old rule:

        Expression
            ::= EqualityExpression:expr
            ;
    

The Expression rule is changed to return the synthesized attribute of the referenced EqualityExpression:

        Expression : minilang::Node*
            ::= EqualityExpression:expr{ return expr; }
            ;
    

Primary Expression Rule

The PrimaryExpression rule consists of three alternatives, a literal, an identifier and a parenthesized expression:

        PrimaryExpression
            ::= Literal:literal
            |   Identifier:identifier
            |   LPAREN Expression:expression RPAREN
            ;
    

We first extend our syntax tree node hierarchy with a node type for a parenthesized expression:

        // Tree.hpp:

        class ParenthesizedExpressionNode : public UnaryNode
        {
        public:
            ParenthesizedExpressionNode(Node* child_);
        };
    

Now the PrimaryExpression can be changed to return a node for a literal, identifier or a parenthesized expression:

        PrimaryExpression : minilang::Node*
            ::= Literal:literal{ return literal; }
            |   Identifier:identifier{ return identifier; }
            |   LPAREN Expression:expression RPAREN{ return new minilang::ParenthesizedExpressionNode(expression); }
            ;
    

Variables of Parsing Rules

Besides synthesized and inherited attributes, a parsing rule may have variables as well. The syntax for creating a variable with type T and name name to a Rule is:

        parser Example
        {
            Rule(var T name)
        }
    

A variable of a rule can be accessed from all semantic actions of the rule.

Postfix Expression Rule

We first change the Node to contain a virtual member function for adding an argument. The default implementation will throw an exception, because it is ment to be overridden:

        // Tree.hpp:

        class Node
        {
        public:
            virtual ~Node();
            virtual void AddArgument(Node* arg);
        };

        // Tree.cpp:

        void Node::AddArgument(Node* arg)
        {
            throw std::runtime_error("cannot add argument to this kind of node");
        }
    

We then add a syntax tree node for a function invocation. The AddArgument member function is overridden in it:

        // Tree.hpp:

        class InvokeNode : public UnaryNode
        {
        public:
            InvokeNode(Node* child_);
            const std::vector<std::unique_ptr<Node>>& Args() const { return args; }
            void AddArgument(Node* arg) override;
        private:
            std::vector<std::unique_ptr<Node>> args;
        };

        // Tree.cpp:

        void InvokeNode::AddArgument(Node* arg)
        {
            args.push_back(std::unique_ptr<Node>(arg));
        }
    

Now we are ready for the changes for postfix expressions. This is the old content the of PostfixExpression rule:

        PostfixExpression
            ::= PrimaryExpression:primary (LPAREN ExpressionList:args? RPAREN!)*
            ;
    

It is now extended to contain a unique pointer variable expr:

        PostfixExpression(var std::unique_ptr<minilang::Node> expr) : minilang::Node*
        // ...
    

When we have parsed a primary expression, we set the synthesized attribute of it as the contents of the expr variable:

        PostfixExpression(var std::unique_ptr<minilang::Node> expr) : minilang::Node*
            ::= 
            (
                PrimaryExpression:primary{ expr.reset(primary); } // ...
            )
            // ...
    

If the input contains an LPAREN, then when we have seen the LPAREN, we construct an InvokeNode with an old content of the expr variable. It will contain a node for the primary expression if we are parsing first function invocation, otherwise it will contain another function invocation node. We reset the expr variable to contain a new invocation node:

        PostfixExpression(var std::unique_ptr<minilang::Node> expr) : minilang::Node*
            ::= 
            (
                PrimaryExpression:primary{ expr.reset(primary); } (LPAREN{ expr.reset(new minilang::InvokeNode(expr.release())); } // ...
            )
            // ...
    

We are going to extend the ExpressionList to contain an inherited syntax tree node attribute. We are now passing the contents of the expr unique pointer variable as an argument that will be the value of inherited attribute in the ExpressionList rule:

        PostfixExpression(var std::unique_ptr<minilang::Node> expr) : minilang::Node*
            ::= 
            (
                PrimaryExpression:primary{ expr.reset(primary); } (LPAREN{ expr.reset(new minilang::InvokeNode(expr.release())); } ExpressionList(expr.get()):args? RPAREN!)*
            )
            // ...
    

Finally we release and return the released expr variable as the synthesized attribute of the postfix expression. It is going to be a node just for a primary expression, if the input has not contained a parenthesized list of arguments, or it is going to be a node for a function invocation if the input has contained a parenthesized list of arguments:

        PostfixExpression(var std::unique_ptr<minilang::Node> expr) : minilang::Node*
            ::= 
            (
                PrimaryExpression:primary{ expr.reset(primary); } (LPAREN{ expr.reset(new minilang::InvokeNode(expr.release())); } ExpressionList(expr.get()):args? RPAREN!)*
            )
            {
                return expr.release();
            }
            ;
    

Expression List Rule

Now this is the old content of the ExpressionList rule:

        ExpressionList
            ::= Expression:left (COMMA Expression:right!)*
            ;
    

First we add an include to the start of the ExpressionParser.parser file:

        [hpp]#include <minilang/Tree.hpp>
        // ...
    

[cpp] and [hpp] attribute of an include directive:

The [hpp] attribute of the include means that the generator will add the include to the generated .hpp file, not to the generated .cpp file as is the case by default. We need to add the include to the header file, because we will add an inherited attribute of syntax tree node type, and the compiler must have the declaration visible for it in the generated header.

If an include directive has a [cpp] attribute, that is the default, the include directived is added to the generated .cpp file.

The rule is: if we have a variable or a synthesized attribute of a parsing rule, the #include for its type can go to the generated .cpp file, but if we have an inherited attribute, the #include for its type must be placed to the generated .hpp file. Also if we have a main parser, #include for an attribute type must be placed to the generated .hpp file.

The changed ExpressionList rule now contains an inherited syntax tree node attribute that will in fact be an invocation node. Each expression node is added to the invocation node by calling the AddArgument member function for it:

        ExpressionList(minilang::Node* owner)
            ::= Expression:left{ owner->AddArgument(left); } (COMMA Expression:right!{ owner->AddArgument(right); })*
            ;
    

Unary Expression Rule

This is the old content of the UnaryExpression rule:

        UnaryExpression
            ::= UnaryOperator:op UnaryExpression:unaryExpr!
            |   PostfixExpression:postfixExpr
            ;
    

We extend our syntax tree node hierarchy with the node for an unary expression with an operator:

        // Tree.hpp:

        class UnaryOpExprNode : public UnaryNode
        {
        public:
            UnaryOpExprNode(Operator op_, Node* child_);
            Operator Op() const { return op; }
        private:
            Operator op;
        };
    

This the changed content of the UnaryExpression rule:

        UnaryExpression : minilang::Node*
            ::= UnaryOperator:op UnaryExpression:unaryExpr!{ return new minilang::UnaryOpExprNode(op, unaryExpr); }
            |   PostfixExpression:postfixExpr{ return postfixExpr; }
            ;
    

This the old content for an UnaryOperator rule:

        UnaryOperator
            ::= PLUS
            |   MINUS
            |   NOT
            ;
    

And this is the changed content:

        UnaryOperator : minilang::Operator
            ::= PLUS{ return minilang::Operator::unaryPlus; }
            |   MINUS{ return minilang::Operator::unaryMinus; }
            |   NOT{ return minilang::Operator::not_; }
            ;
    

Rules for Binary Expressions

We extend first our syntax tree node hierarchy with a node type for a binary expression with an operator:

        // Tree.hpp:

        class BinaryOpExprNode : public BinaryNode
        {
        public:
            BinaryOpExprNode(Operator op_, Node* left_, Node* right_);
            Operator Op() const { return op; }
        private:
            Operator op;
        };
    

Now these are the old binary expression rules:

        MultiplicativeExpression
            ::= UnaryExpression:left (MultiplicativeOperator:op UnaryExpression:right!)*
            ;

        MultiplicativeOperator
            ::= MUL
            |   DIV
            |   MOD
            ;

        AdditiveExpression
            ::= MultiplicativeExpression:left (AdditiveOperator:op MultiplicativeExpression:right!)*
            ;

        AdditiveOperator
            ::= PLUS
            |   MINUS
            ;

        RelationalExpression
            ::= AdditiveExpression:left (RelationalOperator:op AdditiveExpression:right!)*
            ;

        RelationalOperator
            ::= LESS
            |   GREATER
            |   LEQ
            |   GEQ
            ;

        EqualityExpression
            ::= RelationalExpression:left (EqualityOperator:op RelationalExpression:right!)*
            ;

        EqualityOperator
            ::= EQ
            |   NEQ
            ;
    

These are the changed rules, all go according to the same pattern:

        MultiplicativeExpression(var std::unique_ptr expr) : minilang::Node*
            ::= 
            (
                UnaryExpression:left{ expr.reset(left); } (MultiplicativeOperator:op UnaryExpression:right!{ expr.reset(new minilang::BinaryOpExprNode(op, expr.release(), right)); })*
            )
            {
                return expr.release();
            }
            ;

        MultiplicativeOperator : minilang::Operator
            ::= MUL{ return minilang::Operator::mul; }
            |   DIV{ return minilang::Operator::div; }
            |   MOD{ return minilang::Operator::mod; }
            ;

        AdditiveExpression(var std::unique_ptr expr) : minilang::Node*
            ::= 
            (
                MultiplicativeExpression:left{ expr.reset(left); } (AdditiveOperator:op MultiplicativeExpression:right!{ expr.reset(new minilang::BinaryOpExprNode(op, expr.release(), right)); })*
            )
            {
                return expr.release();
            }
            ;

        AdditiveOperator : minilang::Operator
            ::= PLUS{ return minilang::Operator::add; }
            |   MINUS{ return minilang::Operator::sub; }
            ;

        RelationalExpression(var std::unique_ptr expr) : minilang::Node*
            ::= 
            (	
                AdditiveExpression:left{ expr.reset(left); } (RelationalOperator:op AdditiveExpression:right!{ expr.reset(new minilang::BinaryOpExprNode(op, expr.release(), right)); })*
            )
            {
                return expr.release();
            }
            ;

        RelationalOperator : minilang::Operator
            ::= LESS{ return minilang::Operator::less; }
            |   GREATER{ return minilang::Operator::greater; }
            |   LEQ{ return minilang::Operator::lessOrEqual; }
            |   GEQ{ return minilang::Operator::greaterOrEqual; }
            ;

        EqualityExpression(var std::unique_ptr expr) : minilang::Node*
            ::= 
            (
                RelationalExpression:left{ expr.reset(left); } (EqualityOperator:op RelationalExpression:right!{ expr.reset(new minilang::BinaryOpExprNode(op, expr.release(), right)); })*
            )
            {
                return expr.release();
            }
            ;

        EqualityOperator : minilang::Operator
            ::= EQ{ return minilang::Operator::equal; }
            |   NEQ{ return minilang::Operator::notEqual; }
            ;
    

At this point, we may generate sources with spg:

        C:\soulng-1.0.0\examples\minilang>spg MinilangParsers.spg
    

Build is expected to succeed.

Statements

The changes for the StatementParser go according to the same pattern as before. We first extend first our syntax tree node hierarchy with node types for statements:

        class IfStatementNode : public Node
        {
        public:
            IfStatementNode(Node* condition_, Node* thenS_, Node* elseS_);
            Node* Condition() const { return condition.get(); }
            Node* ThenS() const { return thenS.get(); }
            Node* ElseS() const { return elseS.get(); }
        private:
            std::unique_ptr<Node> condition;
            std::unique_ptr<Node> thenS;
            std::unique_ptr<Node> elseS;
        };

        class WhileStatementNode : public Node
        {
        public:
            WhileStatementNode(Node* condition_, Node* statement_);
            Node* Condition() const { return condition.get(); }
            Node* Statement() const { return statement.get(); }
        private:
            std::unique_ptr<Node> condition;
            std::unique_ptr<Node> statement;
        };

        class ReturnStatementNode : public Node
        {
        public:
            ReturnStatementNode(Node* returnValue_);
            Node* ReturnValue() const { return returnValue.get(); }
        private:
            std::unique_ptr<Node> returnValue;
        };

        class ConstructionStatementNode : public Node
        {
        public:
            ConstructionStatementNode(Node* type_, IdentifierNode* variableName_, Node* value_);
            Node* Type() const { return type.get(); }
            IdentifierNode* VariableName() const { return variableName.get(); }
            Node* Value() const { return value.get(); }
        private:
            std::unique_ptr<Node> type;
            std::unique_ptr<IdentifierNode> variableName;
            std::unique_ptr<Node> value;
        };

        class AssignmentStatementNode : public Node
        {
        public:
            AssignmentStatementNode(IdentifierNode* variableName_, Node* value_);
            IdentifierNode* VariableName() const { return variableName.get(); }
            Node* Value() const { return value.get(); }
        private:
            std::unique_ptr<IdentifierNode> variableName;
            std::unique_ptr<Node> value;
        };

        class CompoundStatementNode : public Node
        {
        public:
            void AddStatement(Node* statement);
            const std::vector<std::unique_ptr<Node>>& Statements() const { return statements; }
        private:
            std::vector<std::unique_ptr<Node>> statements;
        };
    

This is the old StatementParser:

        parser StatementParser
        {
            uselexer MinilangLexer;

            using ExpressionParser.Expression;
            using TypeParser.Type;
            using IdentifierParser.Identifier;

            Statement
                ::= IfStatement:ifS
                |   WhileStatement:whileS
                |   ReturnStatement:returnS
                |   ConstructionStatement:constructionS
                |   AssignmentStatement:assignmentS
                |   CompoundStatement:compoundS
                ;

            IfStatement
                ::= IF LPAREN! Expression:condition! RPAREN! Statement:thenS! (ELSE Statement:elseS!)?
                ;

            WhileStatement
                ::= WHILE LPAREN! Expression:condition! RPAREN! Statement:statement!
                ;

            ReturnStatement
                ::= RETURN Expression:returnValue? SEMICOLON!
                ;

            ConstructionStatement
                ::= Type:type Identifier:variableName! ASSIGN! Expression:value! SEMICOLON!
                ;

            AssignmentStatement
                ::= Identifier:variableName ASSIGN! Expression:value! SEMICOLON!
                ;

            CompoundStatement
                ::= LBRACE Statement:statement* RBRACE!
                ;
        }
    

And this is the StatementParser with changes:

        parser StatementParser
        {
            uselexer MinilangLexer;

            using ExpressionParser.Expression;
            using TypeParser.Type;
            using IdentifierParser.Identifier;

            Statement : minilang::Node*
                ::= IfStatement:ifS{ return ifS; }
                |   WhileStatement:whileS{ return whileS; }
                |   ReturnStatement:returnS{ return returnS; }
                |   ConstructionStatement:constructionS{ return constructionS; }
                |   AssignmentStatement:assignmentS{ return assignmentS; }
                |   CompoundStatement:compoundS{ return compoundS; }
                ;

            IfStatement : minilang::Node*
                ::= 
                (
                    IF LPAREN! Expression:condition! RPAREN! Statement:thenS! (ELSE Statement:elseS!)?
                )
                {
                    return new minilang::IfStatementNode(condition, thenS, elseS);
                }
                ;

            WhileStatement : minilang::Node*
                ::= 
                (
                    WHILE LPAREN! Expression:condition! RPAREN! Statement:statement!
                )
                {
                    return new minilang::WhileStatementNode(condition, statement);
                }
                ;

            ReturnStatement : minilang::Node*
                ::= 
                (
                    RETURN Expression:returnValue? SEMICOLON!
                )
                {
                    return new minilang::ReturnStatementNode(returnValue);
                }
                ;

            ConstructionStatement : minilang::Node*
                ::= 
                (
                    Type:type Identifier:variableName! ASSIGN! Expression:value! SEMICOLON!
                )
                {
                    return new minilang::ConstructionStatementNode(type, variableName, value);
                }
                ;

            AssignmentStatement : minilang::Node*
                ::= 
                (
                    Identifier:variableName ASSIGN! Expression:value! SEMICOLON!
                )
                {
                    return new minilang::AssignmentStatementNode(variableName, value);
                }
                ;

            CompoundStatement(var std::unique_ptr compoundStatement) : minilang::CompoundStatementNode*
                ::= 
                (
                    LBRACE{ compoundStatement.reset(new minilang::CompoundStatementNode()); } (Statement:statement{ compoundStatement->AddStatement(statement); })* RBRACE!
                )
                {
                    return compoundStatement.release();
                }
                ;
    

Generate sources with spg:

        C:\soulng-1.0.0\examples\minilang>spg MinilangParsers.spg
    

Build should succeed.

Functions

We first create a syntax tree node type for a function:

        class FunctionNode : public Node
        {
        public:
            FunctionNode(Node* returnType_, IdentifierNode* functionName_);
            Node* ReturnType() const { return returnType.get(); }
            IdentifierNode* FunctionName() const { return functionName.get(); }
            void AddParam(ParameterNode* param);
            const std::vector<std::unique_ptr<ParameterNode>>& Parameters() const { return parameters; }
            void SetBody(CompoundStatementNode* body_);
            CompoundStatementNode* Body() const { return body.get(); }
        private:
            std::unique_ptr<Node> returnType;
            std::unique_ptr<IdentifierNode> functionName;
            std::vector<std::unique_ptr<ParameterNode>> parameters;
            std::unique_ptr<CompoundStatementNode> body;
        };
    

This is the function parsers before changes:

        parser FunctionParser
        {
            uselexer MinilangLexer;

            using TypeParser.Type;
            using IdentifierParser.Identifier;
            using StatementParser.CompoundStatement;

            Function
                ::= Type:returnType Identifier:functionName! LPAREN! ParameterList:params? RPAREN! CompoundStatement:functionBody!
                ;

            ParameterList
                ::= Parameter:left (COMMA Parameter:right!)*
                ;

            Parameter
                ::= Type:type Identifier:name
                ;
        }
    

And this is the FunctionParser with changes:

        parser FunctionParser
        {
            uselexer MinilangLexer;

            using TypeParser.Type;
            using IdentifierParser.Identifier;
            using StatementParser.CompoundStatement;

            Function(var std::unique_ptr<minilang::FunctionNode> function) : minilang::FunctionNode*
                ::= 
                (
                    Type:returnType Identifier:functionName! LPAREN!{ function.reset(new minilang::FunctionNode(returnType, functionName)); } 
                    ParameterList(function.get()):params? RPAREN! 
                    CompoundStatement:functionBody!{ function->SetBody(functionBody); }
                )
                {
                    return function.release();
                }
                ;

            ParameterList(minilang::FunctionNode* function)
                ::= Parameter:left{ function->AddParam(left); } (COMMA Parameter:right!{ function->AddParam(right); })*
                ;

            Parameter : minilang::ParameterNode*
                ::= Type:type Identifier:name
                {
                    return new minilang::ParameterNode(type, name);
                }
                ;
        }
    

Source Files

Now we have only the changes for source files left. We start with the syntax node type as usual:

        // Tree.hpp:

        class SourceFileNode : public Node
        {
        public:
            void AddFunction(FunctionNode* function);
            const std::vector<std::unique_ptr<FunctionNode>>& Functions() const { return functions; }
        private:
            std::vector<std::unique_ptr<FunctionNode>> functions;
        };
    

This is the source file parser before changes:

        parser SourceFileParser
        {
            uselexer MinilangLexer;
            main

            using FunctionParser.Function;

            SourceFile
                ::= Function:function*
                ;
        }
    

The Empty Syntax Element

The changed source file parser looks otherwise familiar, but the empty keyword is new:

        parser SourceFileParser
        {
            uselexer MinilangLexer;
            main;

            using FunctionParser.Function;

            SourceFile(var std::unique_ptr sourceFile) : minilang::SourceFileNode*
                ::= 
                (
                    empty{ sourceFile.reset(new minilang::SourceFileNode()); } (Function:function{ sourceFile->AddFunction(function); })*
                )
                {
                    return sourceFile.release();
                }
                ;
        }
    

The empty keyword can be inserted in a sequence of syntax elements in a body of a parsing rule. It represents a syntax element that will always match. The lexer is not advanced to the next input token with an empty match. Thus the empty syntax element is useful when we want to execute an action at some point in syntax no matter what the current input is. We can attach that kind of action to the empty element.

In the SourceFileParser, we first construct a SourceFileNode, and then add function nodes to it. Finally we return the source file node.

We now generate the sources with the spg tool. After that the code is expected to compile and link without errors.

        C:\soulng-1.0.0\examples\minilang>spg -v MinilangParsers.spg
        > C:/soulng-1.0.0/examples/minilang/MinilangParsers.spg
        > C:/soulng-1.0.0/examples/minilang/ExpressionParser.parser
        ==> C:/soulng-1.0.0/examples/minilang/ExpressionParser.hpp
        ==> C:/soulng-1.0.0/examples/minilang/ExpressionParser.cpp
        > C:/soulng-1.0.0/examples/minilang/FunctionParser.parser
        ==> C:/soulng-1.0.0/examples/minilang/FunctionParser.hpp
        ==> C:/soulng-1.0.0/examples/minilang/FunctionParser.cpp
        > C:/soulng-1.0.0/examples/minilang/IdentifierParser.parser
        ==> C:/soulng-1.0.0/examples/minilang/IdentifierParser.hpp
        ==> C:/soulng-1.0.0/examples/minilang/IdentifierParser.cpp
        > C:/soulng-1.0.0/examples/minilang/LiteralParser.parser
        ==> C:/soulng-1.0.0/examples/minilang/LiteralParser.hpp
        ==> C:/soulng-1.0.0/examples/minilang/LiteralParser.cpp
        > C:/soulng-1.0.0/examples/minilang/SourceFileParser.parser
        ==> C:/soulng-1.0.0/examples/minilang/SourceFileParser.hpp
        ==> C:/soulng-1.0.0/examples/minilang/SourceFileParser.cpp
        > C:/soulng-1.0.0/examples/minilang/StatementParser.parser
        ==> C:/soulng-1.0.0/examples/minilang/StatementParser.hpp
        ==> C:/soulng-1.0.0/examples/minilang/StatementParser.cpp
        > C:/soulng-1.0.0/examples/minilang/TypeParser.parser
        ==> C:/soulng-1.0.0/examples/minilang/TypeParser.hpp
        ==> C:/soulng-1.0.0/examples/minilang/TypeParser.cpp
    

Next we will implement a visitor that walks the generated syntax tree...

up: Table of contents | prev: Generating Parsing Errors | next: Visitor