We encountered problems of having misleading error messages when running the parser tests. For example, an excessive comma in a error_comma.minilang file...
void foo,() { }
... is reported as "end of file expected" error:
parsing error in 'C:/soulng-1.0.0/examples/minilang/test/error_comma.minilang:1': end of file expected: void foo,() ^^^^
The misleading error message arise from the way the generated parser works. The generated parser is a top-down recursive descent backtracking parser. This means it starts parsing from the top element of the syntax and proceeds deeper and deeper in the syntax, until it hits a parsing rule that conflicts with the input. When that happens, it backtracts the input to the point where it was when it started to parse the current alternative, and tries the next alternative. It does this until it either succeeds with the whole input, or the whole input fails to match the top rule of the syntax.
Now to a concrete example: imagine that the minilang parser is parsing the input file error_comma.minilang that has the following content:
void foo,() { }
The top element of the syntax of minilang is the source-file rule that is implemented as the SourceFile parsing rule of SourceFileParser parser. The source-file rule contains a closure of functions, so a parsing proceeds to a function rule that is implemented as the Function parsing rule of the FunctionParser parser.
A function rule consists of a sequence of type, identifier, a left parenthesis, an optional list of parameters, a right parenthesis and a compound-statement. So, the parser starts to match the keyword void to the type parsing rule. It matches because the type rule has keyword void as one of its alternatives. The type rule is implemented in the generated parser as the Type rule of the TypeParser, that contains the token VOID as one of its alternatives. The lexer will return token VOID for a string void, and everything is all right so far.
So the parser advances the input to the next token, that is an ID token. It matches also because the Identifier parsing rule of the IdentifierParser consists of the token ID, and the lexer will return an ID token for an identifier foo.
The parser advances the input to the comma, that is returned as the token COMMA by the lexer. But now we have the LPAREN token to match against the COMMA token in the Function parsing rule of the FunctionParser. When that happens, the parser will backtrack to the next alternative with the input COMMA. The Function parsing rule of the FunctionParser has no alternatives left, so the parser backtracts to the parent of the Function parsing rule that is the SourceFile parsing rule of SourceFileParser parser.
The source-file rule consists of a closure of functions. The function rule failed to match, so the parser backtracks the input to the the point where it was when tried to match the function rule, that is the start of the input void foo,(). This means an empty sequence of functions matched. But now the input contains some extra characters void foo,() and an end of the file is expected at this point, so the parser will generate an error "end of file expected".
Parsing errors can be implemented for generated SoulNG parsers by introducing exclamation marks at suitable points of the syntax in the .parser files. An exclamation mark after a syntax element generates an error if that syntax element fails to match. A general strategy is to introduce the exclamation marks at points in sequences of syntax elements when we know it must be unambiguously the only alternative, so it makes no sense backtracking.
The source-file rule has no such points, but the function rule has many since the .minilang source files consist simply sequences of functions.
At first glance we may think that type in the function rule is the first such point, so we introduce an exclamation mark in the Function rule of the FunctionParser after the return type:
Function ::= Type:returnType! Identifier:functionName LPAREN ParameterList:params? RPAREN CompoundStatement:functionBody ;
Then we will regenerate the parsers by running the spg tool and recompiling:
C:\soulng-1.0.0\examples\minilang>spg MinilangParsers.spg
But when running the parser with the empty.minilang test file generates a parsing error, although it should not, because an empty input must be valid input:
> C:/soulng-1.0.0/examples/minilang/test/empty.minilang parsing error in 'C:/soulng-1.0.0/examples/minilang/test/empty.minilang:1': Type expected: ^
Therefore we remove the exclamation mark after the return type, but after we have seen the return type of the function we know that in this syntax the only valid alternative is that after it must come an identifier of the function, thus we place an exclamation mark after it. Also LPAREN is mandatory, so we place another exclamation mark after it. A parameter list is optional, so it has no exclamation mark, but then the RPAREN and the body of the function are mandatory, so we place exclamation marks after them:
Function ::= Type:returnType Identifier:functionName! LPAREN! ParameterList:params? RPAREN! CompoundStatement:functionBody! ;
After seen a COMMA in a parameter list, there must be another parameter coming, so we introduce an exclamation mark after the second parameter element:
ParameterList ::= Parameter:left (COMMA Parameter:right!)* ;
We will regenerate the parsers by running the spg tool, recompile, and test. Now empty input generates no error, as it should not:
> C:/soulng-1.0.0/examples/minilang/test/empty.minilang end of file 'C:/soulng-1.0.0/examples/minilang/test/empty.minilang' reached
Running the minilang tester with the input file containing an excessive comma now generates a decent error message:
> C:/soulng-1.0.0/examples/minilang/test/error_comma.minilang parsing error in 'C:/soulng-1.0.0/examples/minilang/test/error_comma.minilang:1': '(' expected: void foo,() ^
But running the minilang tester with the test file with missing semicolon generates still a misleading error message:
> C:/soulng-1.0.0/examples/minilang/test/error_semicolon.minilang parsing error in 'C:/soulng-1.0.0/examples/minilang/test/error_semicolon.minilang:2': CompoundStatement expected: { ^
The problem is that we have not yet introduced the exclamation marks in the statement syntax.
After seeing the keyword if of the if statement, we known then must unambiguously come LPAREN, a condition expression, an RPAREN, and a statement:
IfStatement ::= IF LPAREN! Expression:condition! RPAREN! Statement:thenS! (ELSE Statement:elseS!)? ;
Also after the keyword else there must be a statement, although the else part is optional in full.
The while statement needs also exclamation marks for the LPAREN, condition, RPAREN and the statement, because after seeing keyword while they are mandatory:
WhileStatement ::= WHILE LPAREN! Expression:condition! RPAREN! Statement:statement! ;
The semicolon is mandatory in the return statement:
ReturnStatement ::= RETURN Expression:returnValue? SEMICOLON!
In this language, after seeing a type in a statement, we know it must be a construction statement, so we introduce exclamation marks after the identifier, the assignment symbol, the expression, and the semicolon:
ConstructionStatement ::= Type:type Identifier:variableName! ASSIGN! Expression:value! SEMICOLON! ;
Also after seeing an identifier, it must be an assignment statement, so the assignment symbol, a source expression and the semicolon have exclamation marks:
AssignmentStatement ::= Identifier:variableName ASSIGN! Expression:value! SEMICOLON! ;
Finally the right brace in the compound statement is mandatory:
CompoundStatement ::= LBRACE Statement:statement* RBRACE! ;
Now testing with the following content with a missing semicolon...
int gcd(int a, int b) { while (b != 0) { a = a % b; int t = a a = b; b = t; } return a; }
... generates an error message pointing to the start of the assignment statement a = b;, not pointing to the end of the preceding construction statement int t = a as it should:
C:/soulng-1.0.0/examples/minilang/test/error_semicolon.minilang parsing error in 'C:/soulng-1.0.0/examples/minilang/test/error_semicolon.minilang:7': ';' expected: a = b; ^
This is unfortunate, but we have not yet found a solution to this problem.
We have inserted exclamation marks in the following places in the expression syntax:
Expression ::= EqualityExpression:expr ; PrimaryExpression ::= Literal:literal | Identifier:identifier | LPAREN Expression:expression RPAREN ; PostfixExpression ::= PrimaryExpression:primary (LPAREN ExpressionList:args? RPAREN!)* ; ExpressionList ::= Expression:left (COMMA Expression:right!)* ; UnaryExpression ::= UnaryOperator:op UnaryExpression:unaryExpr! | PostfixExpression:postfixExpr ; UnaryOperator ::= PLUS | MINUS | NOT ; MultiplicativeExpression ::= UnaryExpression:left (MultiplicativeOperator:op UnaryExpression:right!)* ; MultiplicativeOperator ::= MUL | DIV | MOD ; AdditiveExpression ::= MultiplicativeExpression:left (AdditiveOperator:op MultiplicativeExpression:right!)* ; AdditiveOperator ::= PLUS | MINUS ; RelationalExpression ::= AdditiveExpression:left (RelationalOperator:op AdditiveExpression:right!)* ; RelationalOperator ::= LESS | GREATER | LEQ | GEQ ; EqualityExpression ::= RelationalExpression:left (EqualityOperator:op RelationalExpression:right!)* ;
The exclamation marks are inserted to the points after seeing punctuation and operator symbols.
That's the end of this section. Next we will see how to generate an abstract syntax tree along with the parsing....
up: Table of contents | prev: Testing the Generated Parsers | next: Abstract Syntax Tree