Tutorial
In this tutorial we will generate a lexical analyzer and parser using Soul 5.1.0
for a string that contains a list of fruit names such as
(apple, orange, banana)
The generated lexer will tokenize the input string and the parser will consume the tokens and make a structured representation of the input.
Probably the easiest and fastest way to create a new application using Soul 5.1.0
is to add it to the soul.sln
solution:
Now the program should compile, so right-click the the fruits
project and select Build
.
Generating a lexical analyzer
First we will create tokens that the lexical analyzer will return when the fruit list is scanned:
-
Right-click the fruits
project and select Add | New Item...
.
-
Enter fruits.token
as file name and click Add
.
-
Enter the following text:
tokens fruits.token
{
(APPLE, "'apple'"), (ORANGE, "'orange'"), (BANANA, "'banana'"), (COMMA, ","), (LPAREN, "("), (RPAREN, ")"), (ID, "identifier")
}
A token declaration
consists of the name of the token and token description string pair in parenthesis. This token file contains tokens for names of fruits, punctuation and identifier.
Next we will create a lexer project file
that references the token file:
Now we can create C++ code for the token declarations by running the lexer generator slg
:
-
Start a terminal in C:\soul-5.1.0\fruits
directory and enter the following command:
Microsoft Windows [Version 10.0.26200.8246]
(c) Microsoft Corporation. All rights reserved.
C:\soul-5.1.0\fruits>slg -v fruits.slg
> C:/soul-5.1.0/fruits/fruits.slg
generating lexers for project 'fruits.slg'...
> C:/soul-5.1.0/fruits/fruits.token
==> C:/soul-5.1.0/fruits/fruits.token.cppm
lexers for project 'fruits.slg' generated successfully.
C:\soul-5.1.0\fruits>
-
The lexer generator has generated the following C++ module interface unit fruits.token.cppm
for the token declarations:
export module fruits.token;
import std;
export namespace fruits::token {
constexpr std::int32_t tokenSetID = 175489805;
constexpr std::int64_t APPLE = (static_cast<std::int64_t>(tokenSetID) << 32) | 1;
constexpr std::int64_t ORANGE = (static_cast<std::int64_t>(tokenSetID) << 32) | 2;
constexpr std::int64_t BANANA = (static_cast<std::int64_t>(tokenSetID) << 32) | 3;
constexpr std::int64_t COMMA = (static_cast<std::int64_t>(tokenSetID) << 32) | 4;
constexpr std::int64_t LPAREN = (static_cast<std::int64_t>(tokenSetID) << 32) | 5;
constexpr std::int64_t RPAREN = (static_cast<std::int64_t>(tokenSetID) << 32) | 6;
constexpr std::int64_t ID = (static_cast<std::int64_t>(tokenSetID) << 32) | 7;
}
Each token has a unique 64-bit identifier that consists of a 32-bit token set identifier and a 32-bit serial number. A token set identifier is a hash value of the name of the token collection fruits.token
.
In this fruit list language the identifier of each fruit is also a keyword
, that is: a reserved word. So next we will create a file for the keywords:
-
Right-click the fruits
project and select Add | New Item...
.
-
Enter fruits.keyword
as file name and click Add
.
-
Enter the following text:
keywords fruits.keyword
{
("apple", APPLE), ("orange", ORANGE), ("banana", BANANA)
}
A keyword declaration
consists of a pair of keyword string and token name in parenthesis.
Now we can update the lexical analyzer's project file fruits.slg
:
-
Double-click the fruits.slg
file and edit the contents to be the following:
project fruits.slg;
tokens <fruits.token>;
keywords <fruits.keyword>;
-
We can now run the lexer generator slg
again:
C:\soul-5.1.0\fruits>slg -v fruits.slg
> C:/soul-5.1.0/fruits/fruits.slg
generating lexers for project 'fruits.slg'...
> C:/soul-5.1.0/fruits/fruits.token
==> C:/soul-5.1.0/fruits/fruits.token.cppm
> C:/soul-5.1.0/fruits/fruits.keyword
lexers for project 'fruits.slg' generated successfully.
The keyword file will not produce C++ code at this moment but the generator will parse it and check its validity.
Next we will create regular expressions
for fruit name identifiers and whitespace seperator characters:
Again we can update the lexer project file fruits.slg
:
-
Double-click the fruits.slg
file and edit the contents to be the following:
project fruits.slg;
tokens <fruits.token>;
keywords <fruits.keyword>;
expressions <fruits.expr>;
-
Running lexer generator slg
again will parse the expressions file:
C:\soul-5.1.0\fruits>slg -v fruits.slg
> C:/soul-5.1.0/fruits/fruits.slg
generating lexers for project 'fruits.slg'...
> C:/soul-5.1.0/fruits/fruits.token
==> C:/soul-5.1.0/fruits/fruits.token.cppm
> C:/soul-5.1.0/fruits/fruits.keyword
> C:/soul-5.1.0/fruits/fruits.expr
lexers for project 'fruits.slg' generated successfully.
We will now complete the fruit list lexical analyzer by adding a lexer file fruits.lexer
:
-
Right-click the fruits
project and select Add | New Item...
.
-
Enter fruits.lexer
as file name and click Add
.
-
Enter the following text:
export module fruits.lexer;
import fruits.token;
import fruits.keyword;
import fruits.expr;
lexer FruitLexer
{
rules
{
"{separators}" {}
"{identifier}"
{
std::int64_t kw = lexer.GetKeywordToken(lexer.CurrentToken().match);
if (kw == soul::lexer::INVALID_TOKEN) return ID; else return kw;
}
"," { return COMMA; }
"\(" { return LPAREN; }
"\)" { return RPAREN; }
}
}
-
The lexer file
has the following structure:
-
First comes the export module
declaration that contains the name of the module interface unit that the generator will put the lexer in. In this case the name of the module interface unit will be fruits.lexer
.
-
Then comes import
declarations for the token, keyword and expression collections.
-
Finally comes the lexer
declaration. The lexer declaration has rules that match current input token to a regular expression and if it matches, the body of the rule, a C++ compound statement, will be executed. Actually the body of the rule with the longest match will execute.
-
The first rule will skip the separator characters. The name of the regular expression to match is separators
. Generally an identifier in braces will match an expression whose name equals to the identifier. The body of the rule is empty, that means that the lexical analyzer will not return any token but continues matching further tokens.
-
The next rule will match identifiers and keywords. If the lexeme of the current token matches a regular expression whose name is identifier
, the body of the rule will check if the token is actually a keyword. If that is the case, the lexical analyzer will return the token identifier of that keyword, otherwise the lexical analyzer will return token identifier ID
.
-
The rest of rules will match punctuation tokens. If the lexeme of the current token matches a comma character, the token identifier COMMA
will be returned. The regular expressions for left and right parenthesis are escaped with the backslash character. This is because the parenthesis are grouping operators in regular expressions, but this time we will want to match parenthesis literally, so they are escaped.
Finally we complete the lexical analyzer's project file fruits.slg
:
-
Double-click the fruits.slg
file and edit the contents to be the following:
project fruits.slg;
tokens <fruits.token>;
keywords <fruits.keyword>;
expressions <fruits.expr>;
lexer <fruits.lexer>;
-
Now we can generate C++ code for the lexical analyzer by running slg
:
C:\soul-5.1.0\fruits>slg -v fruits.slg
> C:/soul-5.1.0/fruits/fruits.slg
generating lexers for project 'fruits.slg'...
> C:/soul-5.1.0/fruits/fruits.token
==> C:/soul-5.1.0/fruits/fruits.token.cppm
> C:/soul-5.1.0/fruits/fruits.keyword
> C:/soul-5.1.0/fruits/fruits.expr
> C:/soul-5.1.0/fruits/fruits.lexer
==> C:/soul-5.1.0/fruits/fruits.lexer.classmap
==> C:/soul-5.1.0/fruits/fruits.lexer.classmap.compressed
==> C:/soul-5.1.0/fruits/fruits.lexer.classmap.rc
==> C:/soul-5.1.0/fruits/fruits.lexer.cppm
==> C:/soul-5.1.0/fruits/fruits.lexer.cpp
lexers for project 'fruits.slg' generated successfully.
C:\soul-5.1.0\fruits>
Now the lexer generator will generate various binary and text files:
-
fruits.token.cppm
is a C++ module interface unit that contains token identifiers.
-
fruits.lexer.classmap
is a binary file that contains character class map that maps a Unicode character to a character class identifier that is a 32-bit integer.
-
fruits.lexer.classmap.compressed
is a binary file that contains ZIP-compressed classmap.
-
fruits.lexer.classmap.rc
is a resource script text file that can be added to a C++ executable project. The script file will embed the classmap in the executable.
-
fruits.lexer.cppm
is a C++ module interface unit that contains a state machine of the lexical analyzer and a factory function MakeLexer
for instantiating the lexical analyzer.
-
fruits.lexer.cpp
is a C++ module implementation unit that contains token and keyword descriptions.
-
The lexer generator has generated the following C++ module interface unit fruits.lexer.cppm
:
export module fruits.lexer;
import std;
import soul.lexer;
import soul.ast.slg;
import soul.ast.common;
import util;
import fruits.token;
export namespace fruits::lexer {
enum class Tag
{
tag
};
std::mutex& MakeLexerMtx();
template<typename Char>
struct FruitLexer;
template<typename Char>
soul::lexer::Lexer<FruitLexer<Char>, Char> MakeLexer(const Char* start, const Char* end, const std::string& fileName);
template<typename Char>
soul::lexer::Lexer<FruitLexer<Char>, Char> MakeLexer(const std::string& moduleFileName, util::ResourceFlags resourceFlags, const Char* start, const Char* end, const std::string& fileName);
soul::ast::common::TokenCollection* GetTokens(fruits::lexer::Tag tag);
struct FruitLexer_Variables : public soul::lexer::Variables
{
FruitLexer_Variables();
};
template<typename Char>
struct FruitLexer
{
using Variables = FruitLexer_Variables;
static std::int32_t NextState(std::int32_t state, Char chr, soul::lexer::LexerBase<Char>& lexer)
{
soul::lexer::ClassMap<Char>* classmap = lexer.GetClassMap();
std::int32_t cls = classmap->GetClass(chr);
switch (state)
{
case 0:
{
switch (cls)
{
case 4:
case 5:
case 6:
case 7:
{
return 1;
}
case 9:
{
return 2;
}
case 10:
{
return 3;
}
case 11:
{
return 4;
}
case 12:
{
return 5;
}
default:
{
return -1;
}
}
}
case 5:
{
auto& token = lexer.CurrentToken();
auto prevMatch = token.match;
token.match = lexer.CurrentLexeme();
std::int64_t tokenId = GetTokenId(4, lexer);
if (tokenId == soul::lexer::CONTINUE_TOKEN)
{
token.id = soul::lexer::CONTINUE_TOKEN;
return -1;
}
else if (tokenId != soul::lexer::INVALID_TOKEN)
{
token.id = tokenId;
}
else
{
token.match = prevMatch;
}
return -1;
}
case 4:
{
auto& token = lexer.CurrentToken();
auto prevMatch = token.match;
token.match = lexer.CurrentLexeme();
std::int64_t tokenId = GetTokenId(3, lexer);
if (tokenId == soul::lexer::CONTINUE_TOKEN)
{
token.id = soul::lexer::CONTINUE_TOKEN;
return -1;
}
else if (tokenId != soul::lexer::INVALID_TOKEN)
{
token.id = tokenId;
}
else
{
token.match = prevMatch;
}
return -1;
}
case 3:
{
auto& token = lexer.CurrentToken();
auto prevMatch = token.match;
token.match = lexer.CurrentLexeme();
std::int64_t tokenId = GetTokenId(2, lexer);
if (tokenId == soul::lexer::CONTINUE_TOKEN)
{
token.id = soul::lexer::CONTINUE_TOKEN;
return -1;
}
else if (tokenId != soul::lexer::INVALID_TOKEN)
{
token.id = tokenId;
}
else
{
token.match = prevMatch;
}
return -1;
}
case 2:
{
auto& token = lexer.CurrentToken();
auto prevMatch = token.match;
token.match = lexer.CurrentLexeme();
std::int64_t tokenId = GetTokenId(1, lexer);
if (tokenId == soul::lexer::CONTINUE_TOKEN)
{
token.id = soul::lexer::CONTINUE_TOKEN;
return -1;
}
else if (tokenId != soul::lexer::INVALID_TOKEN)
{
token.id = tokenId;
}
else
{
token.match = prevMatch;
}
switch (cls)
{
case 8:
case 9:
{
return 6;
}
default:
{
return -1;
}
}
}
case 6:
{
auto& token = lexer.CurrentToken();
auto prevMatch = token.match;
token.match = lexer.CurrentLexeme();
std::int64_t tokenId = GetTokenId(1, lexer);
if (tokenId == soul::lexer::CONTINUE_TOKEN)
{
token.id = soul::lexer::CONTINUE_TOKEN;
return -1;
}
else if (tokenId != soul::lexer::INVALID_TOKEN)
{
token.id = tokenId;
}
else
{
token.match = prevMatch;
}
switch (cls)
{
case 8:
case 9:
{
return 6;
}
default:
{
return -1;
}
}
}
case 1:
{
auto& token = lexer.CurrentToken();
auto prevMatch = token.match;
token.match = lexer.CurrentLexeme();
std::int64_t tokenId = GetTokenId(0, lexer);
if (tokenId == soul::lexer::CONTINUE_TOKEN)
{
token.id = soul::lexer::CONTINUE_TOKEN;
return -1;
}
else if (tokenId != soul::lexer::INVALID_TOKEN)
{
token.id = tokenId;
}
else
{
token.match = prevMatch;
}
switch (cls)
{
case 4:
case 5:
case 6:
case 7:
{
return 1;
}
default:
{
return -1;
}
}
}
}
return -1;
}
static std::int64_t GetTokenId(std::int32_t ruleIndex, soul::lexer::LexerBase<Char>& lexer)
{
switch (ruleIndex)
{
case 0:
{
lexer.Retract();
break;
}
case 1:
{
lexer.Retract();
std::int64_t kw = lexer.GetKeywordToken(lexer.CurrentToken().match);
if (kw == soul::lexer::INVALID_TOKEN) return fruits::token::ID;
else return kw;
break;
}
case 2:
{
lexer.Retract();
return fruits::token::COMMA;
break;
}
case 3:
{
lexer.Retract();
return fruits::token::LPAREN;
break;
}
case 4:
{
lexer.Retract();
return fruits::token::RPAREN;
break;
}
}
return soul::lexer::CONTINUE_TOKEN;
}
};
template<typename Char>
soul::lexer::ClassMap<Char>* GetClassMap(fruits::lexer::Tag tag)
{
static std::unique_ptr<soul::lexer::ClassMap<Char>> classmap(soul::lexer::MakeClassMap<Char>("fruits.lexer.classmap"));
return classmap.get();
}
template<typename Char>
soul::lexer::ClassMap<Char>* GetClassMap(const std::string& moduleFileName, util::ResourceFlags resourceFlags, fruits::lexer::Tag tag)
{
static std::unique_ptr<soul::lexer::ClassMap<Char>> classmap(soul::lexer::MakeClassMap<Char>(moduleFileName, "fruits.lexer.classmap", resourceFlags));
return classmap.get();
}
template<typename Char>
soul::lexer::KeywordMap<Char>* GetKeywords(fruits::lexer::Tag tag);
template<>
soul::lexer::KeywordMap<char>* GetKeywords<char>(fruits::lexer::Tag tag);
template<>
soul::lexer::KeywordMap<char8_t>* GetKeywords<char8_t>(fruits::lexer::Tag tag);
template<>
soul::lexer::KeywordMap<char16_t>* GetKeywords<char16_t>(fruits::lexer::Tag tag);
template<>
soul::lexer::KeywordMap<char32_t>* GetKeywords<char32_t>(fruits::lexer::Tag tag);
template<typename Char>
soul::lexer::Lexer<FruitLexer<Char>, Char> MakeLexer(const Char* start, const Char* end, const std::string& fileName)
{
std::lock_guard<std::mutex> lock(MakeLexerMtx());
auto lexer = soul::lexer::Lexer<FruitLexer<Char>, Char>(start, end, fileName);
lexer.SetClassMap(GetClassMap<Char>(fruits::lexer::Tag()));
lexer.SetTokenCollection(GetTokens(fruits::lexer::Tag()));
lexer.SetKeywordMap(GetKeywords<Char>(fruits::lexer::Tag()));
return lexer;
}
template<typename Char>
soul::lexer::Lexer<FruitLexer<Char>, Char> MakeLexer(const std::string& moduleFileName, util::ResourceFlags resourceFlags, const Char* start, const Char* end, const std::string& fileName)
{
std::lock_guard<std::mutex> lock(MakeLexerMtx());
auto lexer = soul::lexer::Lexer<FruitLexer<Char>, Char>(start, end, fileName);
lexer.SetClassMap(GetClassMap<Char>(moduleFileName, resourceFlags, fruits::lexer::Tag()));
lexer.SetTokenCollection(GetTokens(fruits::lexer::Tag()));
lexer.SetKeywordMap(GetKeywords<Char>(fruits::lexer::Tag()));
return lexer;
}
}
-
The generator has generated the following C++ module implementation unit fruits.lexer.cpp
:
module fruits.lexer;
namespace fruits::lexer {
soul::ast::common::TokenCollection* GetTokens(fruits::lexer::Tag tag)
{
static soul::ast::common::TokenCollection tokens("fruits.lexer.tokens");
if (!tokens.Initialized())
{
tokens.SetInitialized();
tokens.AddToken(new soul::ast::common::Token(fruits::token::APPLE, "APPLE", "'apple'"));
tokens.AddToken(new soul::ast::common::Token(fruits::token::ORANGE, "ORANGE", "'orange'"));
tokens.AddToken(new soul::ast::common::Token(fruits::token::BANANA, "BANANA", "'banana'"));
tokens.AddToken(new soul::ast::common::Token(fruits::token::COMMA, "COMMA", ","));
tokens.AddToken(new soul::ast::common::Token(fruits::token::LPAREN, "LPAREN", "("));
tokens.AddToken(new soul::ast::common::Token(fruits::token::RPAREN, "RPAREN", ")"));
tokens.AddToken(new soul::ast::common::Token(fruits::token::ID, "ID", "identifier"));
}
return &tokens;
}
FruitLexer_Variables::FruitLexer_Variables()
{
}
std::mutex mtx;
std::mutex& MakeLexerMtx() { return mtx; }
template<>
soul::lexer::KeywordMap<char>* GetKeywords<char>(fruits::lexer::Tag tag)
{
static const soul::lexer::Keyword<char> keywords[] = {
{ "apple", fruits::token::APPLE },
{ "orange", fruits::token::ORANGE },
{ "banana", fruits::token::BANANA },
{ nullptr, -1 }
};
static soul::lexer::KeywordMap<char> keywordMap(keywords);
return &keywordMap;
}
template<>
soul::lexer::KeywordMap<char8_t>* GetKeywords<char8_t>(fruits::lexer::Tag tag)
{
static const soul::lexer::Keyword<char8_t> keywords[] = {
{ u8"apple", fruits::token::APPLE },
{ u8"orange", fruits::token::ORANGE },
{ u8"banana", fruits::token::BANANA },
{ nullptr, -1 }
};
static soul::lexer::KeywordMap<char8_t> keywordMap(keywords);
return &keywordMap;
}
template<>
soul::lexer::KeywordMap<char16_t>* GetKeywords<char16_t>(fruits::lexer::Tag tag)
{
static const soul::lexer::Keyword<char16_t> keywords[] = {
{ u"apple", fruits::token::APPLE },
{ u"orange", fruits::token::ORANGE },
{ u"banana", fruits::token::BANANA },
{ nullptr, -1 }
};
static soul::lexer::KeywordMap<char16_t> keywordMap(keywords);
return &keywordMap;
}
template<>
soul::lexer::KeywordMap<char32_t>* GetKeywords<char32_t>(fruits::lexer::Tag tag)
{
static const soul::lexer::Keyword<char32_t> keywords[] = {
{ U"apple", fruits::token::APPLE },
{ U"orange", fruits::token::ORANGE },
{ U"banana", fruits::token::BANANA },
{ nullptr, -1 }
};
static soul::lexer::KeywordMap<char32_t> keywordMap(keywords);
return &keywordMap;
}
}
Generating a parser
First we will generate an abstract syntax tree classes for parsed fruits:
Adding parser file:
The parser file
has following structure:
-
The export module
declaration names the module that the parser generator will place the parser in. This parser will be placed in fruits.parser
module.
-
Next come import declarations. The imports can be prefixed with interface
and implementation
specifiers. If the import is prefixed with the interface
specifier, the import will be placed to the module interface unit. If the import is prefixed with the implementation
specifier, the import will be placed to the module implementation unit. If the import has no prefix, it will be placed to the module interface unit. The parser imports the abstract syntax tree file fruits.cppm
. This import goes to the interface unit because the fruit
class is referenced by the parser interface in the parsing rules FruitList
and Fruit
. The parser imports also the lexer module and the token module. Theses imports can go to the implementation unit, because the lexer is not referenced in the parser interface, and the tokens are referenced in the bodies of the parsing rules.
-
Finally comes the parser declaration. The parser declaration can have:
-
lexer
statements
-
using
statements
-
a main
statement
-
parsing rules
The lexer
statement specifies the lexer that the parser is instantiated with. There can be more than one lexer
statement for a parser. In that case the lexers must produce same tokens. A using
statement import a rule from another parser, so it can be used the same way as rules declared inside currrent parser. The main
statement states that the parser is a main
parser. The parser generator will generate a Parse
function for each main parser. There can be more than one main parsers in the same project. The Parse
function acts as an entry point for a parser. Rest of a parser consists of parsing rules
. A parsing rule has a body, it can have paramters and local variables and it can return a value. You can think of a parser as a class, and a parsing rule as a member function. In fact the parser is implemented as a static class and a parsing rule is implemented as a member function of that class.
The FruitParser
parser has two parsing rules: FruitList
and Fruit
. The first rule is special in that the generated Parse
function will call the first rule. The FruitList
rule takes a vector of unique pointers to fruits::fruit
as a parameter and returns no value. The Fruit
rule takes no parameters, has no local variables, and returns a fruits::fruit
pointer. The body of the rule is separated from the interface of the rule by ::=
symbol pronounced produces
. The body of a rule is terminated by a semicolon.
The body of a rule consists of
parsing expressions
. The FruitList
rule parses a possibly empty list of Fruit
s separated by COMMA
tokens and enclosed in parenthesis, that is: LPAREN
and RPAREN
tokens. When a Fruit
rule is parsed its return value is added to a vector of std::unique_ptr<fruits::fruit>
.
The body of Fruit
rule consists of tokens APPLE
, ORANGE
and BANANA
separated by the choice
parsing operator |
. The choice parsing operator matches its operands from left to right. The semantic action, a C++ compound statement, attached to the first matching operand will get executed.
Now a parser project file
can be created:
Now we can generate C++ source code for the parser:
-
Execute the parser generator spg
with the fruits.spg
argument:
C:\soul-5.1.0\fruits>spg -v fruits.spg
> C:/soul-5.1.0/fruits/fruits.spg
generating parsers for project 'fruits.spg'...
> C:/soul-5.1.0/fruits/fruits.parser
linking...
> C:/soul-5.1.0/fruits/fruits.parser
generating code...
==> C:/soul-5.1.0/fruits/fruits.parser.cppm
==> C:/soul-5.1.0/fruits/fruits.parser.cpp
generating rule name module...
==> C:/soul-5.1.0/fruits/fruits_rules.cppm
==> C:/soul-5.1.0/fruits/fruits_rules.cpp
parsers for project 'fruits.spg' generated successfully.
C:\soul-5.1.0\fruits>
-
The parser generator has generated the following C++ module interface unit fruits.parser.cppm
for fruits.parser
:
export module fruits.parser;
import std;
import soul.lexer;
import soul.parser;
import fruits;
export namespace fruits::parser {
template<typename LexerT>
struct FruitParser
{
static void Parse(LexerT& lexer, std::vector<std::unique_ptr<fruits::fruit>>* fruitList);
static soul::parser::Match FruitList(LexerT& lexer, std::vector<std::unique_ptr<fruits::fruit>>* fruitList);
static soul::parser::Match Fruit(LexerT& lexer);
};
}
The FruitParser
is a template that is instantiated in the C++ module implementation unit.
-
The generator has generated the following C++ module implementation unit fruits.parser.cpp
for fruits.parser
:
module fruits.parser;
import util;
import soul.ast.common;
import soul.ast.spg;
import fruits.lexer;
import fruits.token;
namespace fruits::parser {
template<typename LexerT>
void FruitParser<LexerT>::Parse(LexerT& lexer, std::vector<std::unique_ptr<fruits::fruit>>* fruitList)
{
#ifdef SOUL_PARSER_DEBUG_SUPPORT
if (lexer.Log())
{
lexer.Log()->WriteBeginRule("parse");
lexer.Log()->IncIndent();
}
#endif
++lexer;
soul::parser::Match match = FruitParser<LexerT>::FruitList(lexer, fruitList);
#ifdef SOUL_PARSER_DEBUG_SUPPORT
if (lexer.Log())
{
lexer.Log()->DecIndent();
lexer.Log()->WriteEndRule("parse");
}
#endif
if (match.hit)
{
if (*lexer == soul::lexer::END_TOKEN)
{
return;
}
else
{
lexer.ThrowFarthestError();
}
}
else
{
lexer.ThrowFarthestError();
}
return;
}
template<typename LexerT>
soul::parser::Match FruitParser<LexerT>::FruitList(LexerT& lexer, std::vector<std::unique_ptr<fruits::fruit>>* fruitList)
{
#ifdef SOUL_PARSER_DEBUG_SUPPORT
std::int64_t parser_debug_match_pos = 0;
bool parser_debug_write_to_log = lexer.Log() != nullptr;
if (parser_debug_write_to_log)
{
parser_debug_match_pos = lexer.GetPos();
soul::lexer::WriteBeginRuleToLog(lexer, "FruitList");
}
#endif
soul::lexer::RuleGuard<LexerT> ruleGuard(lexer, 1391209262306295809);
std::unique_ptr<fruits::fruit> first;
std::unique_ptr<fruits::fruit> next;
soul::parser::Match match(false);
soul::parser::Match* parentMatch0 = &match;
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch1 = &match;
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch2 = &match;
{
soul::parser::Match match(false);
if (*lexer == fruits::token::LPAREN)
{
++lexer;
match.hit = true;
}
*parentMatch2 = match;
}
if (match.hit)
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch3 = &match;
{
soul::parser::Match match(true);
std::int64_t save = lexer.GetPos();
soul::parser::Match* parentMatch4 = &match;
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch5 = &match;
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch6 = &match;
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch7 = &match;
{
std::int64_t pos = lexer.GetPos();
soul::parser::Match match = fruits::parser::FruitParser<LexerT>::Fruit(lexer);
first.reset(static_cast<fruits::fruit*>(match.value));
if (match.hit)
{
fruitList->push_back(std::unique_ptr < fruits::fruit > (first.release()));
}
*parentMatch7 = match;
}
*parentMatch6 = match;
}
if (match.hit)
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch8 = &match;
{
soul::parser::Match match(true);
soul::parser::Match* parentMatch9 = &match;
{
while (true)
{
std::int64_t save = lexer.GetPos();
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch10 = &match;
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch11 = &match;
{
soul::parser::Match match(false);
if (*lexer == fruits::token::COMMA)
{
++lexer;
match.hit = true;
}
*parentMatch11 = match;
}
if (match.hit)
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch12 = &match;
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch13 = &match;
{
std::int64_t pos = lexer.GetPos();
soul::parser::Match match = fruits::parser::FruitParser<LexerT>::Fruit(lexer);
next.reset(static_cast<fruits::fruit*>(match.value));
if (match.hit)
{
fruitList->push_back(std::unique_ptr < fruits::fruit > (next.release()));
}
*parentMatch13 = match;
}
*parentMatch12 = match;
}
*parentMatch11 = match;
}
*parentMatch10 = match;
}
if (match.hit)
{
*parentMatch9 = match;
}
else
{
lexer.SetPos(save);
break;
}
}
}
}
*parentMatch8 = match;
}
*parentMatch6 = match;
}
*parentMatch5 = match;
}
if (match.hit)
{
*parentMatch4 = match;
}
else
{
lexer.SetPos(save);
}
}
*parentMatch3 = match;
}
*parentMatch2 = match;
}
*parentMatch1 = match;
}
if (match.hit)
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch14 = &match;
{
soul::parser::Match match(false);
if (*lexer == fruits::token::RPAREN)
{
++lexer;
match.hit = true;
}
*parentMatch14 = match;
}
*parentMatch1 = match;
}
*parentMatch0 = match;
}
#ifdef SOUL_PARSER_DEBUG_SUPPORT
if (parser_debug_write_to_log)
{
if (match.hit) soul::lexer::WriteSuccessToLog(lexer, parser_debug_match_pos, "FruitList");
else soul::lexer::WriteFailureToLog(lexer, "FruitList");
}
#endif
if (!match.hit)
{
match.value = nullptr;
}
return match;
}
template<typename LexerT>
soul::parser::Match FruitParser<LexerT>::Fruit(LexerT& lexer)
{
#ifdef SOUL_PARSER_DEBUG_SUPPORT
std::int64_t parser_debug_match_pos = 0;
bool parser_debug_write_to_log = lexer.Log() != nullptr;
if (parser_debug_write_to_log)
{
parser_debug_match_pos = lexer.GetPos();
soul::lexer::WriteBeginRuleToLog(lexer, "Fruit");
}
#endif
soul::lexer::RuleGuard<LexerT> ruleGuard(lexer, 1391209262306295810);
soul::parser::Match match(false);
soul::parser::Match* parentMatch0 = &match;
{
std::int64_t save = lexer.GetPos();
soul::parser::Match match(false);
soul::parser::Match* parentMatch1 = &match;
{
std::int64_t save = lexer.GetPos();
soul::parser::Match match(false);
soul::parser::Match* parentMatch2 = &match;
{
std::int64_t pos = lexer.GetPos();
soul::parser::Match match(false);
if (*lexer == fruits::token::APPLE)
{
++lexer;
match.hit = true;
}
if (match.hit)
{
{
#ifdef SOUL_PARSER_DEBUG_SUPPORT
if (parser_debug_write_to_log) soul::lexer::WriteSuccessToLog(lexer, parser_debug_match_pos, "Fruit");
#endif
return soul::parser::Match(true, new fruits::apple());
}
}
*parentMatch2 = match;
}
*parentMatch1 = match;
if (!match.hit)
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch3 = &match;
lexer.SetPos(save);
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch4 = &match;
{
std::int64_t pos = lexer.GetPos();
soul::parser::Match match(false);
if (*lexer == fruits::token::ORANGE)
{
++lexer;
match.hit = true;
}
if (match.hit)
{
{
#ifdef SOUL_PARSER_DEBUG_SUPPORT
if (parser_debug_write_to_log) soul::lexer::WriteSuccessToLog(lexer, parser_debug_match_pos, "Fruit");
#endif
return soul::parser::Match(true, new fruits::orange());
}
}
*parentMatch4 = match;
}
*parentMatch3 = match;
}
*parentMatch1 = match;
}
}
*parentMatch0 = match;
if (!match.hit)
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch5 = &match;
lexer.SetPos(save);
{
soul::parser::Match match(false);
soul::parser::Match* parentMatch6 = &match;
{
std::int64_t pos = lexer.GetPos();
soul::parser::Match match(false);
if (*lexer == fruits::token::BANANA)
{
++lexer;
match.hit = true;
}
if (match.hit)
{
{
#ifdef SOUL_PARSER_DEBUG_SUPPORT
if (parser_debug_write_to_log) soul::lexer::WriteSuccessToLog(lexer, parser_debug_match_pos, "Fruit");
#endif
return soul::parser::Match(true, new fruits::banana());
}
}
*parentMatch6 = match;
}
*parentMatch5 = match;
}
*parentMatch0 = match;
}
}
#ifdef SOUL_PARSER_DEBUG_SUPPORT
if (parser_debug_write_to_log)
{
if (match.hit) soul::lexer::WriteSuccessToLog(lexer, parser_debug_match_pos, "Fruit");
else soul::lexer::WriteFailureToLog(lexer, "Fruit");
}
#endif
if (!match.hit)
{
match.value = nullptr;
}
return match;
}
template struct FruitParser<soul::lexer::Lexer<fruits::lexer::FruitLexer<char32_t>, char32_t>>;
}
Because the FruitParser
is defined as a main
parser the generator has generated a Parse
function that calls the first rule of the parser. The FruitParser
is explicitly instantiated with lexers defined in the lexer
statements of the parser.
Now we can add the generated files to the fruits
C++ project:
-
Right-click the fruits
project and select Add Existing Item...
.
-
Add the following files:
-
fruits.cppm
.
-
fruits.lexer.classmap.rc
-
fruits.lexer.cpp
-
fruits.lexer.cppm
-
fruits.parser.cpp
-
fruits.parser.cppm
-
fruits.token.cppm
.
-
Then right-click the fruits
project and select Build
.
We will now complete the main program:
-
Double-click the main.cpp
file and edit the contents to be the following:
import std;
import fruits;
import fruits.lexer;
import fruits.parser;
import fruits.token;
import util;
std::vector<std::unique_ptr<fruits::fruit>> parse_fruit_list(const std::string& fruitListText)
{
std::u32string fruitList = util::ToUtf32(fruitListText);
auto lexer = fruits::lexer::MakeLexer(fruitList.c_str(), fruitList.c_str() + fruitList.length(), "<fruitList>");
using LexerType = decltype(lexer);
std::vector<std::unique_ptr<fruits::fruit>> fruitVec;
fruits::parser::FruitParser<LexerType>::Parse(lexer, &fruitVec);
return fruitVec;
}
int main()
{
std::vector<std::unique_ptr<fruits::fruit>> fruitVec = parse_fruit_list("(apple, orange, banana)");
std::cout << "(";
bool first = true;
for (const auto& fruit : fruitVec)
{
if (first)
{
first = false;
}
else
{
std::cout << ", ";
}
fruit->print();
}
std::cout << ")" << "\n";
}
The parse_fruit_list
function converts its text argument to UTF-32 string, creates a lexer by calling the MakeLexer
function, creates an empty vector of fruits, and parses the UTF-32 string containing the fruit list into the vector. The main function calls the parse_fruit_list
function and then prints parsed fruits.
Now we will edit the program configuration:
-
Right-click fruits
project and select Properties
.
-
Select the Debug
configuration.
-
Select the General
property sheet.
-
Set fruitsd
as the Target Name
.
-
Select the Release
configuration.
-
Set fruits
as the Target Name
.
-
Click OK
to accept changes.
Now you can build the fruits
project for the x64 | Debug
and x64 | Release
configurations.
Testing:
-
Start a terminal in C:\soul-5.1.0\inst
directory and run the inst.bat
:
C:\soul-5.1.0\inst>inst.bat
C:/soul-5.1.0/x64/Debug/fruitsd.exe -> C:/soul-5.1.0/bin/fruitsd.exe
C:/soul-5.1.0/x64/Release/fruits.exe -> C:/soul-5.1.0/bin/fruits.exe
source directory 'C:/soul-5.1.0/tools/otava/projects/soul/oslg/bin/release/2' does not exist
has only single valid path
source directory 'C:/soul-5.1.0/tools/otava/projects/soul/ospg/bin/release/2' does not exist
has only single valid path
source directory 'C:/soul-5.1.0/tools/otava/projects/compiler/ooc/bin/release/2' does not exist
has only single valid path
C:\soul-5.1.0\inst>
The compiled executables fruits.exe
and the debug version fruitsd.exe
are copied to the bin
directory.
-
Run the fruits executable:
C:\soul-5.1.0\inst>fruits
(apple, orange, banana)
C:\soul-5.1.0\inst>
This completes the tutorial.