Introduction

The SoulNG project contains two C++ tools: a lexical analyzer generator slg and a parser generator spg.

The generators and the produced lexical analyzers and parsers use four libraries also included in the project: The soulng/cppcode library is used for representing C++ code in the generators. The soulng/lexer library contains a base class for generated lexical analyzer classes. The soulng/parser library contains two small classes needed in parsing. The soulng/util library contains common utilities, utilities for manipulating Unicode strings and an interface to a Unicode character database also included in the project. The Unicode character database is a binary file located in the unicode subdirectrory.

The tools and libraries of the SoulNG project are implemented and tested using Microsoft Visual Studio Community Edition 2019 for Windows version 16.11.5 using x64 configuration, with Boost version 1.77 libraries installed. Visual C++ compiler is needed for utilizing these tools and libraries on Windows. Tools and libraries are tested also on WSL using the GNU C++ compiler, g++ version 10.3.0, and the Boost version 1.71.0 libraries installed. A C++ compiler and Boost libraries are needed for utilizing these tools and libraries on Linux / on WSL.

Lexical Analyzer Generator

The slg tool takes a .lexer file that contains the description of a lexical analyzer as input and produces C++ source code for a lexical analyzer as output. The lexical analyzer will tokenize an input string passed to it as a UTF-32 encoded string. The input string may contain the content of a UTF-8 encoded text file, for example. The util library provides functions for reading a text file into a string and converting that string to a UTF-32 string.

The description of the lexical analyzer in a .lexer file is presented as token declarations, regular expression patterns and semantic actions, C++ statements, that connect the patterns and tokens. The lexical analyzer generated by slg is a finite state machine that recognizes the patterns and returns the corresponding tokens to the parser.

Parser Generator

The spg tool takes an .spg file, a container file that references a number of .parser files, and those .parser files that contain descriptions of parsers as input and produces C++ source code for parser classes as output. The parser classes will use a lexical analyzer generated with slg to do the tokenization of the input content. The parsers contain no state. They are C++ classes that consist of static member functions that call each other recursively. The parser classes form together a recursive descent top-down backracking parser.

Example

The examples/minilang directory contains a Visual Studio project that enclose a lexical analyzer and a parser for a very small C-like language called Minilang. The tutorials included in this documentation guide through the development of the lexical analyzer and the parser for Minilang in stages.

Projects

The deployment package contains source code for the following high-level project modules that all use the soulng core module:

Arrows represent dependencies between the modules.

The package contains a C++ front-end library module sngcpp, an XML and XPATH parsing library module sngxml and Cmajor language library module sngcm that is also contained by the Cmajor compiler. The 'gendoc', 'sng2html' and 'cpp2cm' documentation and project conversion utilities are also included.