We will convert each Minilang grammar table to a parser.
The parser descriptions are contained in .parser files that are read by the SoulNG parser generator spg.
We start with the types:
type | → | int | bool | void |
The names of the types, int, bool and void, are keywords in the language, and the lexical analyzer has corresponding tokens INT, BOOL and VOID that it returns to the parser, so the conversion is easy
Footnote 1 parser TypeParser { Type ::= INT | BOOL | VOID ; }
A parser declaration consists of the keyword parser followed by the name of the parser followed by the definition of the parser enclosed in braces. The parser definition consists of parsing rules each of which consists of a name followed by the "produces" symbol, "::=", followed by the body of the rule and terminated by the semicolon.
In the body of a rule lexer tokens are referred directly by name, that is typically written in capital letters. The alternatives are separated by the '|' symbol.
The grammar for literals has more complex structure:
literal | → | boolean‑literal | integer‑literal |
boolean‑literal | → | true | false |
integer‑literal | → | digit‑sequence |
digit‑sequence | → | [0-9]+ |
We start with the literal grammar rule. This rule refers to other grammar rules boolean-literal and integer-literal. That can be directly expressed in the parser:
parser LiteralParser { Literal ::= BooleanLiteral:booleanLiteral | IntegerLiteral:integerLiteral ; }
The Literal parsing rule above refers to parsing rules BooleanLiteral and IntegerLiteral.
When a parsing rule is referenced in the body of another parsing rule, it must be given a unique name within the body of that rule
for the reasons that will become clear in the future. The syntax for naming is ParsingRule
The grammar rule boolean-literal contains just keywords, so the lexer has corresponding tokens for them:
parser LiteralParser { // .. BooleanLiteral ::= TRUE | FALSE ; }
The lexical analyzer has a rule integer_literal that returns token INTLIT. We will therefore define corresponding parsing rule IntegerLiteral, that consists of just that INTLIT token:
parser LiteralParser { // .. IntegerLiteral ::= INTLIT ; }
This is the whole parser for the literal grammar:
parser LiteralParser { Literal ::= BooleanLiteral:booleanLiteral | IntegerLiteral:integerLiteral ; BooleanLiteral ::= TRUE | FALSE ; IntegerLiteral ::= INTLIT ; }
The identifier parser will be just as easy as the type parser:
identifier | → | Unicode identifier |
The lexer has rule identifier that returns the ID token for identifiers:
parser IdentifierParser { Identifier ::= ID ; }
We start converting grammars for expression from the grammar for primary-expression:
primary‑expression | → | literal | identifier | ( expression ) |
In the body of a parsing rule we may also refer to parsing rules defined in other parsers. For that, we must include a using declaration for the foreign rule in the definition of a parser:
parser ExpressionParser { using LiteralParser.Literal; using IdentifierParser.Identifier; Expresssion ::= empty ; PrimaryExpression ::= Literal:literal | Identifier:identifier | LPAREN Expression:expression RPAREN ; }
A using declaration consists of the keyword using followed by the name of a parser followed by '.' followed by the name of a rule defined in that parser. We have included using declarations for the Literal rule defined in the LiteralParser parser and the for the Identifier rule defined in the IdentifierParser parser.
We have temporarily defined a rule Expression whose body consists just of the keyword empty in order to refer to the Expression rule in the body of the PrimaryExpression rule.
The body of the PrimaryExpression rule consists of alternatives for Literal, Identifier and parenthesized Expression. A sequence consisting of tokens and references to parsing rules can be expression by separating the elements of the sequence with white space.
The postfix-expression grammar rule has some repeating and optional elements:
postfix‑expression | → | primary‑expression (( expression‑list? ))* |
expression-list | → | expression (, expression)* |
In a body of a parsing rule the '?' symbol is used for optional elements, the '*' symbol for a Kleene closure of elements and the parentheses are used for grouping elements:
parser ExpressionParser { // ... PostfixExpression ::= PrimaryExpression:primary (LPAREN ExpressionList:args? RPAREN)* ; ExpressionList ::= Expression:left (COMMA Expression:right)* ; }
The closure is applied to the whole LPAREN ExpressionList RPAREN sequence, so those are enclosed in parentheses. Also the sequence COMMA Expression may be repeating, so it is parenthesized.
The unary-expression grammar rule is recursive:
unary‑expression | → | (+ | - | !) unary‑expression | postfix‑expression |
We have separated the unary operator to its own parsing rule, although it is not mandatory:
parser ExpressionParser { // ... UnaryExpression ::= UnaryOperator:op UnaryExpression:unaryExpr | PostfixExpression:postfixExpr ; UnaryOperator ::= PLUS | MINUS | NOT ; }
A multiplicative-expression consists of unary-expressions separated by multiplicative operator symbols:
multiplicative‑expression | → | unary‑expression ((* | / | %) unary‑expression)* |
In this case we have again separated the operator to its own rule:
parser ExpressionParser { // ... MultiplicativeExpression ::= UnaryExpression:left (MultiplicativeOperator:op UnaryExpression:right)* ; MultiplicativeOperator ::= MUL | DIV | MOD ; }
The addive-expression has the same structure as the multiplicative-expression:
additive‑expression | → | multiplicative‑expression ((+ | -) multiplicative‑expression)* |
Nothing new:
parser ExpressionParser { // ... AdditiveExpression ::= MultiplicativeExpression:left (AdditiveOperator:op MultiplicativeExpression:right)* ; AdditiveOperator ::= PLUS | MINUS ; }
Same structure, just different symbols and different precedence level:
relational‑expression | → | additive‑expression ((< | > | <= | >=) additive‑expression)* |
parser ExpressionParser { // ... RelationalExpression ::= AdditiveExpression:left (RelationalOperator:op AdditiveExpression:right)* ; RelationalOperator ::= LESS | GREATER | LEQ | GEQ ; }
And finally the equality expression:
equality‑expression | → | relational‑expression ((== | !=) relational‑expression)* |
parser ExpressionParser { // ... EqualityExpression ::= RelationalExpression:left (EqualityOperator:op RelationalExpression:right)* ; EqualityOperator ::= EQ | NEQ ; }
The equality expression completes the loop:
expression | → | equality‑expression |
Here's the whole expression parser:
parser ExpressionParser { using LiteralParser.Literal; using IdentifierParser.Identifier; Expression ::= EqualityExpression:expr ; PrimaryExpression ::= Literal:literal | Identifier:identifier | LPAREN Expression:expression RPAREN ; PostfixExpression ::= PrimaryExpression:primary (LPAREN ExpressionList:args? RPAREN)* ; ExpressionList ::= Expression:left (COMMA Expression:right)* ; UnaryExpression ::= UnaryOperator:op UnaryExpression:unaryExpr | PostfixExpression:postfixExpr ; UnaryOperator ::= PLUS | MINUS | NOT ; MultiplicativeExpression ::= UnaryExpression:left (MultiplicativeOperator:op UnaryExpression:right)* ; MultiplicativeOperator ::= MUL | DIV | MOD ; AdditiveExpression ::= MultiplicativeExpression:left (AdditiveOperator:op MultiplicativeExpression:right)* ; AdditiveOperator ::= PLUS | MINUS ; RelationalExpression ::= AdditiveExpression:left (RelationalOperator:op AdditiveExpression:right)* ; RelationalOperator ::= LESS | GREATER | LEQ | GEQ ; EqualityExpression ::= RelationalExpression:left (EqualityOperator:op RelationalExpression:right)* ; EqualityOperator ::= EQ | NEQ ; }
The statements are next in line:
statement | → | if‑statement | while‑statement | return‑statement | construction‑statement | assignment‑statement | compound‑statement |
Our stragegy is to write a skeleton containing all required rules and fill them later:
parser StatementParser { Statement ::= IfStatement:ifS | WhileStatement:whileS | ReturnStatement:returnS | ConstructionStatement:constructionS | AssignmentStatement:assignmentS | CompoundStatement:compoundS ; IfStatement ::= empty ; WhileStatement ::= empty ; ReturnStatement ::= empty ; ConstructionStatement ::= empty ; AssignmentStatement ::= empty ; CompoundStatement ::= empty ; }
Here's the grammar for an if-statement:
if‑statement | → | if ( expression ) statement ( else statement )? |
The conversion to a parser is quite straightforward:
parser StatementParser { using ExpressionParser.Expression; // ... IfStatement ::= IF LPAREN Expression:condition RPAREN Statement:thenS (ELSE Statement:elseS)? ; }
A grammar for a while-statement is just as easy:
while‑statement | → | while ( expression ) statement |
Here's the parser:
parser StatementParser { // ... WhileStatement ::= WHILE LPAREN Expression:condition RPAREN Statement:statement ; }
The return-statement:
return‑statement | → | return expression? ; |
Converted:
parser StatementParser { // ... ReturnStatement ::= RETURN Expression:returnValue? SEMICOLON ; }
A construction-statement creates a variable and assigns it a value:
construction‑statement | → | type identifier = expression ; |
We need to have some using declarations for Type and Identifier rules:
parser StatementParser { // ... using TypeParser.Type; using IdentifierParser.Identifier; // ... ConstructionStatement ::= Type:type Identifier:variableName ASSIGN Expression:value SEMICOLON ; }
The assignment-statement in turn:
assignment‑statement | → | identifier = expression ; |
Conversion is not harder than before:
parser StatementParser { // ... AssignmentStatement ::= Identifier:variableName ASSIGN Expression:value SEMICOLON ; }
Finally the compound statement:
compound‑statement | → | { statement* } |
A possibly empty sequence of statements enclosed in braces:
parser StatementParser { // ... CompoundStatement ::= LBRACE Statement:statement* RBRACE ; }
Here's the whole statement parser:
parser StatementParser { using ExpressionParser.Expression; using TypeParser.Type; using IdentifierParser.Identifier; Statement ::= IfStatement:ifS | WhileStatement:whileS | ReturnStatement:returnS | ConstructionStatement:constructionS | AssignmentStatement:assignmentS | CompoundStatement:compoundS ; IfStatement ::= IF LPAREN Expression:condition RPAREN Statement:thenS (ELSE Statement:elseS)? ; WhileStatement ::= WHILE LPAREN Expression:condition RPAREN Statement:statement ; ReturnStatement ::= RETURN Expression:returnValue? SEMICOLON ; ConstructionStatement ::= Type:type Identifier:variableName ASSIGN Expression:value SEMICOLON ; AssignmentStatement ::= Identifier:variableName ASSIGN Expression:value SEMICOLON ; CompoundStatement ::= LBRACE Statement:statement* RBRACE ; }
A function has a return type, a possibly empty list of parameters enclosed in parentheses and a body:
function | → | type identifier ( parameter‑list? ) compound‑statement |
parameter‑list | → | parameter (, parameter )* |
parameter | → | type identifier |
There's nothing new in the function conversion, so we present here the entire parser at once:
parser FunctionParser { using TypeParser.Type; using IdentifierParser.Identifier; using StatementParser.CompoundStatement; Function ::= Type:returnType Identifier:functionName LPAREN ParameterList:params? RPAREN CompoundStatement:functionBody ; ParameterList ::= Parameter:left (COMMA Parameter:right)* ; Parameter ::= Type:type Identifier:name ; }
A Minilang source-file is just a sequence of functions:
source‑file | → | function* |
Here's the source file parser:
parser SourceFileParser { using FunctionParser.Function; SourceFile ::= Function:function* ; }
Now the language has all the needed parsers defined. We will next see what additions are necessary to generate source code for the parsers and make the source code compile...