C++ Interface of the Lexer

The interface of the lexical analyzer has two main operations: if a variable or parameter named lexer is of the type of the generated lexical analyzer (or reference to it), then the expression *lexer returns the identifier of the current token, and the expression ++lexer advances the lexer to the next token. In addition the lexer has a member variable token that contains the matched lexeme, a pair of pointers to the beginning and end of the matching characters.

As it moves forward in the stream of tokens, the lexer will save the scanned non-separator tokens. They will have indeces starting from zero. The GetPos() member function of the lexer will return the position, or index, of the current token and the current line number, and the SetPos() member function will set the lexer to the given position in the token stream, so that the *lexer operation will return the identifier of the token in that position. In this way the parser can backtrack to a previous position and rescan tokens from that position onward again. The GetToken() member function of the lexer will return a token in the given position.

A span contains a range of token indeces. The GetSpan() member function of the lexer will return a span that has the current token index as its start and end index. By setting start and/or end indeces of a span, and then calling the GetMatch() member function of the lexer with that span, the lexer will return a string that contains all matching characters including separators for that span of tokens.