Implementation

Operation of the lexical analyzer generator

First regular expressions of the lexical analyzer are turned to nondeterministic finite automata, NFA. Then the input alphabet, essentially Unicode characters, is partitioned into character classes with respect to transitions of the automata. Each character class is assigned a unique integer identifier. A map from Unicode characters to character class identifiers, a classmap, is created. Then the automata are combined to form one big NFA by creating a new start state and adding epsilon-transitions from the new start state to the start states of each small NFA. Finally the big NFA is compiled to a deterministic finite automaton, DFA, using the subset construction algorithm. The DFA has transitions for each character class. The generated lexical analyzer, the DFA, operates by getting an input character, getting its character class and moving from some state to another. When the DFA is in an accepting state, it has recognized some input token that corresponds to some regular expression.

Operation of the parser generator

The parser generator generates a top-down backtracking recursive descent parser from input grammars. The closest formalism for a Soul grammar is a Parsing Expression Grammar or PEG . The choice operator of the Soul grammar is equivalent to the prioritized choice of the PEG. Soul grammar does not have not-predicate that PEGs have. On the other hand PEG does not have the positive closure , optional and lookahead operators that Soul grammars have.

First the parser generator parses given SPG, token and parser files. Then it links the parser files together and resolves the using statements in the parsers. Then if the optimize option is specified, it turns the choice operations to switch-case parsers for those choice operands that have disjoint FIRST sets, that is: if they have mutually exclusive starting tokens. Then the generator visits the parsing expressions contained by the parsers two times: once for generating code for the interface module unit and once for the implementation module unit.