up: Table of contents | prev: Testing the Lexer | next: Building the Parsers

3.1 Writing Parsers

We will convert each Minilang grammar table to a parser.

The parser descriptions are contained in .parser files that are read by the SoulNG parser generator spg.

Types

We start with the types:

type int | bool | void

The names of the types, int, bool and void, are keywords in the language, and the lexical analyzer has corresponding tokens INT, BOOL and VOID that it returns to the parser, so the conversion is easy

Footnote 1
        parser TypeParser
        {
            Type
                ::= INT
                |   BOOL
                |   VOID
                ;
        }
    

A parser declaration consists of the keyword parser followed by the name of the parser followed by the definition of the parser enclosed in braces. The parser definition consists of parsing rules each of which consists of a name followed by the "produces" symbol, "::=", followed by the body of the rule and terminated by the semicolon.

In the body of a rule lexer tokens are referred directly by name, that is typically written in capital letters. The alternatives are separated by the '|' symbol.

Literals

The grammar for literals has more complex structure:

literal boolean‑literal | integer‑literal
boolean‑literal true | false
integer‑literal digit‑sequence
digit‑sequence [0-9]+

We start with the literal grammar rule. This rule refers to other grammar rules boolean-literal and integer-literal. That can be directly expressed in the parser:

        parser LiteralParser
        {
            Literal
                ::= BooleanLiteral:booleanLiteral
                |   IntegerLiteral:integerLiteral
                ;
        }
    

Unique Names

The Literal parsing rule above refers to parsing rules BooleanLiteral and IntegerLiteral. When a parsing rule is referenced in the body of another parsing rule, it must be given a unique name within the body of that rule for the reasons that will become clear in the future. The syntax for naming is ParsingRule:uniqueName, that is: separate the unique name uniqueName of rule ParsingRule with a colon. So when the Literal parsing rule refers to BooleanLiteral parsing rule, the reference to BooleanLiteral rule has been given unique name booleanLiteral, and the reference to IntegerLiteral rule has been given unique name integerLiteral.

The grammar rule boolean-literal contains just keywords, so the lexer has corresponding tokens for them:

        parser LiteralParser
        {
            // ..

            BooleanLiteral
                ::= TRUE
                |   FALSE
                ;
        }
    

The lexical analyzer has a rule integer_literal that returns token INTLIT. We will therefore define corresponding parsing rule IntegerLiteral, that consists of just that INTLIT token:

        parser LiteralParser
        {
            // ..

            IntegerLiteral
                ::= INTLIT
                ;
        }
    

This is the whole parser for the literal grammar:

        parser LiteralParser
        {
            Literal
                ::= BooleanLiteral:booleanLiteral
                |   IntegerLiteral:integerLiteral
                ;

            BooleanLiteral
                ::= TRUE
                |   FALSE
                ;

            IntegerLiteral
                ::= INTLIT
                ;
        }
    

Identifiers

The identifier parser will be just as easy as the type parser:

identifier Unicode identifier

The lexer has rule identifier that returns the ID token for identifiers:

        parser IdentifierParser
        {
            Identifier
                ::= ID
                ;
        }
    

Expressions

We start converting grammars for expression from the grammar for primary-expression:

Primary Expression

primary‑expression literal | identifier | ( expression )

In the body of a parsing rule we may also refer to parsing rules defined in other parsers. For that, we must include a using declaration for the foreign rule in the definition of a parser:

        parser ExpressionParser
        {
            using LiteralParser.Literal;
            using IdentifierParser.Identifier;

            Expresssion
                ::= empty
                ;

            PrimaryExpression
                ::= Literal:literal
                |   Identifier:identifier
                |   LPAREN Expression:expression RPAREN
                ;
        }
    

A using declaration consists of the keyword using followed by the name of a parser followed by '.' followed by the name of a rule defined in that parser. We have included using declarations for the Literal rule defined in the LiteralParser parser and the for the Identifier rule defined in the IdentifierParser parser.

We have temporarily defined a rule Expression whose body consists just of the keyword empty in order to refer to the Expression rule in the body of the PrimaryExpression rule.

The body of the PrimaryExpression rule consists of alternatives for Literal, Identifier and parenthesized Expression. A sequence consisting of tokens and references to parsing rules can be expression by separating the elements of the sequence with white space.

Postfix Expression

The postfix-expression grammar rule has some repeating and optional elements:

postfix‑expression primary‑expression (( expression‑list? ))*
expression-list expression (, expression)*

In a body of a parsing rule the '?' symbol is used for optional elements, the '*' symbol for a Kleene closure of elements and the parentheses are used for grouping elements:

        parser ExpressionParser
        {
            // ...

            PostfixExpression
                ::= PrimaryExpression:primary (LPAREN ExpressionList:args? RPAREN)*
                ;

            ExpressionList
                ::= Expression:left (COMMA Expression:right)*
                ;
        }
    

The closure is applied to the whole LPAREN ExpressionList RPAREN sequence, so those are enclosed in parentheses. Also the sequence COMMA Expression may be repeating, so it is parenthesized.

Unary Expression

The unary-expression grammar rule is recursive:

unary‑expression (+ | - | !) unary‑expression | postfix‑expression

We have separated the unary operator to its own parsing rule, although it is not mandatory:

        parser ExpressionParser
        {
            // ...

            UnaryExpression
                ::= UnaryOperator:op UnaryExpression:unaryExpr
                |   PostfixExpression:postfixExpr
                ;

            UnaryOperator
                ::= PLUS
                |   MINUS
                |   NOT
                ;
        }
    

Multiplicative Expression

A multiplicative-expression consists of unary-expressions separated by multiplicative operator symbols:

multiplicative‑expression unary‑expression ((* | / | %) unary‑expression)*

In this case we have again separated the operator to its own rule:

        parser ExpressionParser
        {
            // ...

            MultiplicativeExpression
                ::= UnaryExpression:left (MultiplicativeOperator:op UnaryExpression:right)*
                ;

            MultiplicativeOperator
                ::= MUL
                |   DIV
                |   MOD
                ;
        }
    

Additive Expression

The addive-expression has the same structure as the multiplicative-expression:

additive‑expression multiplicative‑expression ((+ | -) multiplicative‑expression)*

Nothing new:

        parser ExpressionParser
        {
            // ...

            AdditiveExpression
                ::= MultiplicativeExpression:left (AdditiveOperator:op MultiplicativeExpression:right)*
                ;

            AdditiveOperator
                ::= PLUS
                |   MINUS
                ;
        }
    

Relational Expression

Same structure, just different symbols and different precedence level:

relational‑expression additive‑expression ((< | > | <= | >=) additive‑expression)*
        parser ExpressionParser
        {
            // ...
        
            RelationalExpression
                ::= AdditiveExpression:left (RelationalOperator:op AdditiveExpression:right)*
                ;

            RelationalOperator
                ::= LESS
                |   GREATER
                |   LEQ
                |   GEQ
                ;
        }
    

Equality Expression

And finally the equality expression:

equality‑expression relational‑expression ((== | !=) relational‑expression)*
        parser ExpressionParser
        {
            // ...
        
            EqualityExpression
                ::= RelationalExpression:left (EqualityOperator:op RelationalExpression:right)*
                ;

            EqualityOperator
                ::= EQ
                |   NEQ
                ;
        }
    

Expression

The equality expression completes the loop:

expression equality‑expression

Here's the whole expression parser:

        parser ExpressionParser
        {
            using LiteralParser.Literal;
            using IdentifierParser.Identifier;

            Expression
                ::= EqualityExpression:expr
                ;

            PrimaryExpression
                ::= Literal:literal
                |   Identifier:identifier
                |   LPAREN Expression:expression RPAREN
                ;

            PostfixExpression
                ::= PrimaryExpression:primary (LPAREN ExpressionList:args? RPAREN)*
                ;

            ExpressionList
                ::= Expression:left (COMMA Expression:right)*
                ;

            UnaryExpression
                ::= UnaryOperator:op UnaryExpression:unaryExpr
                |   PostfixExpression:postfixExpr
                ;

            UnaryOperator
                ::= PLUS
                |   MINUS
                |   NOT
                ;

            MultiplicativeExpression
                ::= UnaryExpression:left (MultiplicativeOperator:op UnaryExpression:right)*
                ;

            MultiplicativeOperator
                ::= MUL
                |   DIV
                |   MOD
                ;

            AdditiveExpression
                ::= MultiplicativeExpression:left (AdditiveOperator:op MultiplicativeExpression:right)*
                ;

            AdditiveOperator
                ::= PLUS
                |   MINUS
                ;

            RelationalExpression
                ::= AdditiveExpression:left (RelationalOperator:op AdditiveExpression:right)*
                ;

            RelationalOperator
                ::= LESS
                |   GREATER
                |   LEQ
                |   GEQ
                ;

            EqualityExpression
                ::= RelationalExpression:left (EqualityOperator:op RelationalExpression:right)*
                ;

            EqualityOperator
                ::= EQ
                |   NEQ
                ;
        }
    

Statements

The statements are next in line:

statement if‑statement | while‑statement | return‑statement | construction‑statement | assignment‑statement | compound‑statement

Our stragegy is to write a skeleton containing all required rules and fill them later:

        parser StatementParser
        {
            Statement
                ::= IfStatement:ifS
                |   WhileStatement:whileS
                |   ReturnStatement:returnS
                |   ConstructionStatement:constructionS
                |   AssignmentStatement:assignmentS
                |   CompoundStatement:compoundS
                ;

            IfStatement
                ::= empty
                ;

            WhileStatement
                ::= empty
                ;

            ReturnStatement
                ::= empty
                ;

            ConstructionStatement
                ::= empty
                ;

            AssignmentStatement
                ::= empty
                ;

            CompoundStatement
                ::= empty
                ;
        }
    

If Statement

Here's the grammar for an if-statement:

if‑statement if ( expression ) statement ( else statement )?

The conversion to a parser is quite straightforward:

        parser StatementParser
        {
            using ExpressionParser.Expression;
            
            // ...

            IfStatement
                ::= IF LPAREN Expression:condition RPAREN Statement:thenS (ELSE Statement:elseS)?
                ;
        }
    

While Statement

A grammar for a while-statement is just as easy:

while‑statement while ( expression ) statement

Here's the parser:

        parser StatementParser
        {
            // ...

            WhileStatement
                ::= WHILE LPAREN Expression:condition RPAREN Statement:statement
                ;
        }
    

Return Statement

The return-statement:

return‑statement return expression? ;

Converted:

        parser StatementParser
        {
            // ...

            ReturnStatement
                ::= RETURN Expression:returnValue? SEMICOLON
                ;
        }
    

Construction Statement

A construction-statement creates a variable and assigns it a value:

construction‑statement type identifier = expression ;

We need to have some using declarations for Type and Identifier rules:

        parser StatementParser
        {
            // ...
            using TypeParser.Type;
            using IdentifierParser.Identifier;

            // ...

            ConstructionStatement
                ::= Type:type Identifier:variableName ASSIGN Expression:value SEMICOLON
                ;
        }
    

Assignment Statement

The assignment-statement in turn:

assignment‑statement identifier = expression ;

Conversion is not harder than before:

        parser StatementParser
        {
            // ...

            AssignmentStatement
                ::= Identifier:variableName ASSIGN Expression:value SEMICOLON
                ;
        }
    

Compound Statement

Finally the compound statement:

compound‑statement { statement* }

A possibly empty sequence of statements enclosed in braces:

        parser StatementParser
        {
            // ...

            CompoundStatement
                ::= LBRACE Statement:statement* RBRACE
                ;
        }
    

Here's the whole statement parser:

        parser StatementParser
        {
            using ExpressionParser.Expression;
            using TypeParser.Type;
            using IdentifierParser.Identifier;

            Statement
                ::= IfStatement:ifS
                |   WhileStatement:whileS
                |   ReturnStatement:returnS
                |   ConstructionStatement:constructionS
                |   AssignmentStatement:assignmentS
                |   CompoundStatement:compoundS
                ;

            IfStatement
                ::= IF LPAREN Expression:condition RPAREN Statement:thenS (ELSE Statement:elseS)?
                ;

            WhileStatement
                ::= WHILE LPAREN Expression:condition RPAREN Statement:statement
                ;

            ReturnStatement
                ::= RETURN Expression:returnValue? SEMICOLON
                ;

            ConstructionStatement
                ::= Type:type Identifier:variableName ASSIGN Expression:value SEMICOLON
                ;

            AssignmentStatement
                ::= Identifier:variableName ASSIGN Expression:value SEMICOLON
                ;

            CompoundStatement
                ::= LBRACE Statement:statement* RBRACE
                ;
        }
    

Functions

A function has a return type, a possibly empty list of parameters enclosed in parentheses and a body:

function type identifier ( parameter‑list? ) compound‑statement
parameter‑list parameter (, parameter )*
parameter type identifier

There's nothing new in the function conversion, so we present here the entire parser at once:

        parser FunctionParser
        {
            using TypeParser.Type;
            using IdentifierParser.Identifier;
            using StatementParser.CompoundStatement;

            Function
                ::= Type:returnType Identifier:functionName LPAREN ParameterList:params? RPAREN CompoundStatement:functionBody
                ;

            ParameterList
                ::= Parameter:left (COMMA Parameter:right)*
                ;

            Parameter
                ::= Type:type Identifier:name
                ;
        }
    

Source Files

A Minilang source-file is just a sequence of functions:

source‑file function*

Here's the source file parser:

        parser SourceFileParser
        {
            using FunctionParser.Function;

            SourceFile
                ::= Function:function*
                ;
        }
    

Now the language has all the needed parsers defined. We will next see what additions are necessary to generate source code for the parsers and make the source code compile...


Footnote 1: We have placed this definition in the text file TypeParser.parser. It is not mandatory to have each parser definition in a separate text file. up: Table of contents | prev: Testing the Lexer | next: Building the Parsers