up: Table of contents | prev: Building the Lexer | next: Writing Parsers

2.3 Testing the Lexer

Our Main.cpp source file should now contain only the following content:

        #include <iostream>

        int main()
        {
            return 0;
        }
    

Library Initialization

The soulng/util library needs dynamic initialization, so we add the following initialization code to the Main.cpp:

        #include <soulng/util/InitDone.hpp>
        #include <iostream>

        void InitApplication()
        {
            soulng::util::Init();
        }

        void DoneApplication()
        {
            soulng::util::Done();
        }
	

Error Handling

The soulng libraries report errors by throwing exceptions of type std::runtime_error, so the next thing to do is to handle standard exceptions: I have placed the initialization inside a try block because it can throw an exception. (Got this wrong in the previous versions of this project.)

        #include <soulng/util/InitDone.hpp>
        #include <iostream>
        #include <stdexcept>

        int main()
        {
            try
            {
                InitApplication();
            }
            catch (const std::exception& ex)
            {
                std::cerr << ex.what() << std::endl;
                return 1;
            }
            DoneApplication();
            return 0;
        }
    

We have included the <stdexcept> header and added a try block that prints the error to the standard error stream.

Tester Function

We will now write a tester function that will read a test file, construct a lexer and print the tokens contained in the test file to the standard output:

        // ...
        #include <soulng/util/MappedInputFile.hpp>

        void TestMinilangLexer(const std::string& minilangFilePath)
        {
            std::cout << "> " << minilangFilePath << std::endl;
            std::string s = soulng::util::ReadFile(minilangFilePath);
            // ...
        }
    

First we print the path of the given test file to the standard output. We then call a ReadFile() utility function of the soulng/util library, that will read the contents of the given UTF-8 encoded text file into a std::string. We have included the <soulng/util/MappedInputFile.hpp> header the contains the signature of the ReadFile function.

The lexer operates internally with UTF-32 characters, so we need to convert the UTF-8 string to UTF-32:

        // ...
        #include <soulng/util/Unicode.hpp>

        void TestMinilangLexer(const std::string& minilangFilePath)
        {
            // ...
            std::string s = soulng::util::ReadFile(minilangFilePath);
            std::u32string content = soulng::unicode::ToUtf32(s);
            // ...
        }
    

We call the ToUtf32() utiltity function of the soulng::util library to do the conversion. We have included the <soulng/util/Unicode.hpp> header that contains the signature of the ToUtf32 function.

Now we construct the lexer and have it read the first token:

        // ...
        #include <minilang/MinilangLexer.hpp>

        void TestMinilangLexer(const std::string& minilangFilePath)
        {
            // ...
            std::u32string content = soulng::unicode::ToUtf32(s);
            MinilangLexer lexer(content, minilangFilePath, 0);
            ++lexer;
            // ...

    

The lexer constructor takes a UTF-32 string that will be tokenized, a file path included in error messages produced by the lexer and a file index included in spans generated by the lexer. The file index is not used by this example so we set it to 0. The ++lexer call is necessary to have the lexer produce the first token.

Next we will test if the lexer has reach the end of the input, and if not get the match from the lexer:

        // ...
        #include <minilang/MinilangTokens.hpp>

        void TestMinilangLexer(const std::string& minilangFilePath)
        {
            // ...
            ++lexer;
            while (*lexer != MinilangTokens::END)
            {
                std::u32string match = lexer.token.match.ToString();
                // ...
    

The *lexer expression returns the identifier of the current token. We compare it to the END token that represents end of input.

If the end of input not reached, we will print the name of the matched token and the matching string:

        // ...
        void TestMinilangLexer(const std::string& minilangFilePath)
        {
            // ...
                std::u32string match = lexer.token.match.ToString();
                std::cout << MinilangTokens::GetTokenName(*lexer) << "(" << soulng::unicode::ToUtf8(match) << ")" << std::endl;
            // ...
    

The GetTokenName() function returns the identifier of the token as a string.

Finally we advance the lexer to the next token with the ++lexer expression, go to testing the while condition again, and if the end of input reached, print a message to the standard output:

        // ...
        void TestMinilangLexer(const std::string& minilangFilePath)
        {
            // ...
            while (*lexer != MinilangTokens::END)
            {
                // ...
                ++lexer;
            }
            std::cout << "end of file '" << minilangFilePath << "' reached" << std::endl;
        }
    

Here's the complete test function:

        #include <minilang/MinilangLexer.hpp>
        #include <minilang/MinilangTokens.hpp>
        #include <soulng/util/InitDone.hpp>
        #include <soulng/util/MappedInputFile.hpp>
        #include <soulng/util/Unicode.hpp>
        #include <iostream>
        #include <stdexcept>

        void TestMinilangLexer(const std::string& minilangFilePath)
        {
            std::cout << "> " << minilangFilePath << std::endl;
            std::string s = soulng::util::ReadFile(minilangFilePath);
            std::u32string content = soulng::unicode::ToUtf32(s);
            MinilangLexer lexer(content, minilangFilePath, 0);
            ++lexer;
            while (*lexer != MinilangTokens::END)
            {
                std::u32string match = lexer.token.match.ToString();
                std::cout << MinilangTokens::GetTokenName(*lexer) << "(" << soulng::unicode::ToUtf8(match) << ")" << std::endl;
                ++lexer;
            }
            std::cout << "end of file '" << minilangFilePath << "' reached" << std::endl;
        }
    

Main Function

We will add a simple command option interface to the main function: If the command option is "--lexer-test", the TestMinilangLexer() function is called with the test file path:

        void PrintUsage()
        {
            std::cout << "Usage: minilang [options] { file.minilang }" << std::endl;
            std::cout << "Options:" << std::endl;
            std::cout << "--help | -h:" << std::endl;
            std::cout << "  Print help and exit." << std::endl;
            std::cout << "--lexer-test | -l" << std::endl;
            std::cout << "  Test lexical analyzer with <file.minilang>." << std::endl;
        }

        enum class Command
        {
            none, lexerTest
        };

        int main(int argc, const char** argv)
        {
            try
            {
                InitApplication();
                std::vector files;
                Command command = Command::none;
                for (int i = 1; i < argc; ++i)
                {
                    std::string arg = argv[i];
                    if (soulng::util::StartsWith(arg, "--"))
                    {
                        if (arg == "--help")
                        {
                            PrintUsage();
                            return 1;
                        }
                        else if (arg == "--lexer-test")
                        {
                            command = Command::lexerTest;
                        }
                        else
                        {
                            throw std::runtime_error("unknown argument '" + arg + "'");
                        }
                    }
                    else if (soulng::util::StartsWith(arg, "-"))
                    {
                        std::string options = arg.substr(1);
                        if (options.empty())
                        {
                            throw std::runtime_error("unknown argument '" + arg + "'");
                        }
                        for (char o : options)
                        {
                            if (o == 'h')
                            {
                                PrintUsage();
                                return 1;
                            }
                            else if (o == 'l')
                            {
                                command = Command::lexerTest;
                            }
                            else
                            {
                                throw std::runtime_error("unknown argument '-" + std::string(1, o) + "'");
                            }
                        }
                    }
                    else
                    {
                        files.push_back(soulng::util::GetFullPath(arg));
                    }
                }
                if (files.empty() || command == Command::none)
                {
                    PrintUsage();
                    return 1;
                }
                for (const std::string& filePath : files)
                {
                    if (command == Command::lexerTest)
                    {
                        TestMinilangLexer(filePath);
                    }
                    else
                    {
                        PrintUsage();
                        throw std::runtime_error("minilang: unknown command");
                    }
                }
            }
            catch (const std::exception& ex)
            {
                std::cerr << ex.what() << std::endl;
                return 1;
            }
			DoneApplication();
            return 0;
        }
    

Here's the complete Main.cpp file:

        #include <minilang/MinilangLexer.hpp>
        #include <minilang/MinilangTokens.hpp>
        #include <soulng/util/InitDone.hpp>
        #include <soulng/util/MappedInputFile.hpp>
        #include <soulng/util/Unicode.hpp>
        #include <soulng/util/Path.hpp>
        #include <soulng/util/TextUtils.hpp>
        #include <iostream>
        #include <stdexcept>

        void TestMinilangLexer(const std::string& minilangFilePath)
        {
            std::cout << "> " << minilangFilePath << std::endl;
            std::string s = soulng::util::ReadFile(minilangFilePath);
            std::u32string content = soulng::unicode::ToUtf32(s);
            MinilangLexer lexer(content, minilangFilePath, 0);
            ++lexer;
            while (*lexer != MinilangTokens::END)
            {
                std::u32string match = lexer.token.match.ToString();
                std::cout << MinilangTokens::GetTokenName(*lexer) << "(" << soulng::unicode::ToUtf8(match) << ")" << std::endl;
                ++lexer;
            }
            std::cout << "end of file '" << minilangFilePath << "' reached" << std::endl;
        }

        void InitApplication()
        {
            soulng::util::Init();
        }

        void DoneApplication()
        {
            soulng::util::Done();
        }

        void PrintUsage()
        {
            std::cout << "Usage: minilang [options] { file.minilang }" << std::endl;
            std::cout << "Options:" << std::endl;
            std::cout << "--help | -h:" << std::endl;
            std::cout << "  Print help and exit." << std::endl;
            std::cout << "--lexer-test | -l" << std::endl;
            std::cout << "  Test lexical analyzer with <file.minilang>." << std::endl;
        }

        enum class Command
        {
            none, lexerTest
        };

        int main(int argc, const char** argv)
        {
            try
            {
                InitApplication();
                std::vector files;
                Command command = Command::none;
                for (int i = 1; i < argc; ++i)
                {
                    std::string arg = argv[i];
                    if (soulng::util::StartsWith(arg, "--"))
                    {
                        if (arg == "--help")
                        {
                            PrintUsage();
                            return 1;
                        }
                        else if (arg == "--lexer-test")
                        {
                            command = Command::lexerTest;
                        }
                        else
                        {
                            throw std::runtime_error("unknown argument '" + arg + "'");
                        }
                    }
                    else if (soulng::util::StartsWith(arg, "-"))
                    {
                        std::string options = arg.substr(1);
                        if (options.empty())
                        {
                            throw std::runtime_error("unknown argument '" + arg + "'");
                        }
                        for (char o : options)
                        {
                            if (o == 'h')
                            {
                                PrintUsage();
                                return 1;
                            }
                            else if (o == 'l')
                            {
                                command = Command::lexerTest;
                            }
                            else
                            {
                                throw std::runtime_error("unknown argument '-" + std::string(1, o) + "'");
                            }
                        }
                    }
                    else
                    {
                        files.push_back(soulng::util::GetFullPath(arg));
                    }
                }
                if (files.empty() || command == Command::none)
                {
                    PrintUsage();
                    return 1;
                }
                for (const std::string& filePath : files)
                {
                    if (command == Command::lexerTest)
                    {
                        TestMinilangLexer(filePath);
                    }
                    else
                    {
                        PrintUsage();
                        throw std::runtime_error("minilang: unknown command");
                    }
                }
            }
            catch (const std::exception& ex)
            {
                std::cerr << soulng::unicode::ToUtf32(ex.what()) << std::endl;
                return 1;
            }
            DoneApplication();
            return 0;
        }
    

The project should build now without any errors.

Running the First Test

The first test file fibocacci.minilang that is written in Minilang language contains a function generating the Fibonacci sequence:

        int fibonacci(int n)
        {
            if (n == 0) return 0;
            if (n == 1) return 1;
            return fibonacci(n - 1) + fibonacci(n - 2);
        }
    

Set the Configuration Properties / Debugging / Command Arguments to the value --lexer-test test\fibonacci.minilang and run the program:

We get the following output:

        > C:/soulng-1.0.0/examples/minilang/test/fibonacci.minilang
        INT(int)
        ID(fibonacci)
        LPAREN(()
        INT(int)
        ID(n)
        RPAREN())
        LBRACE({)
        IF(if)
        LPAREN(()
        ID(n)
        EQ(==)
        INTLIT(0)
        RPAREN())
        RETURN(return)
        INTLIT(0)
        SEMICOLON(;)
        IF(if)
        LPAREN(()
        ID(n)
        EQ(==)
        INTLIT(1)
        RPAREN())
        RETURN(return)
        INTLIT(1)
        SEMICOLON(;)
        RETURN(return)
        ID(fibonacci)
        LPAREN(()
        ID(n)
        MINUS(-)
        INTLIT(1)
        RPAREN())
        PLUS(+)
        ID(fibonacci)
        LPAREN(()
        ID(n)
        MINUS(-)
        INTLIT(2)
        RPAREN())
        SEMICOLON(;)
        RBRACE(})
        end of file 'C:/soulng-1.0.0/examples/minilang/test/fibonacci.minilang' reached
    

Running the Second Test

The lexer operates internally with UTF-32 characters and has Unicode identifier classes enabled, so we next test the lexer with some Unicode identifiers.

The unicodeid.minilang contains some Finnish characters:

        int örkki()
        {
            int öljyä = 1;
            return öljyä;
        }
    

Set the Configuration Properties / Debugging / Command Arguments to the value --lexer-test test\unicodeid.minilang and run the program:

We get the following output:

        > C:/soulng-1.0.0/examples/minilang/test/unicodeid.minilang
        INT(int)
        ID(├Ârkki)
        LPAREN(()
        RPAREN())
        LBRACE({)
        INT(int)
        ID(├Âljy├ñ)
        ASSIGN(=)
        INTLIT(1)
        SEMICOLON(;)
        RETURN(return)
        ID(├Âljy├ñ)
        SEMICOLON(;)
        RBRACE(})
        end of file 'C:/soulng-1.0.0/examples/minilang/test/unicodeid.minilang' reached
    

The output does not look right because as far as we know the Windows console cannot handle UTF-8 text properly as Linux terminal can with UTF-8 locale enabled. Texts needs to be converted to UTF-16 to show right on Windows console. We have therefore added the an operator<<() function that can print UTF-32 strings. The function is declared in ConsoleUnicode.hpp and defined in ConsoleUnicode.cpp:

// ConsoleUnicode.hpp:

        #ifndef CONSOLE_UNICODE_HPP
        #define CONSOLE_UNICODE_HPP
        #include <ostream>
        #include <string>

        std::ostream& operator<<(std::ostream& s, const std::u32string& utf32Str);
        #endif // CONSOLE_UNICODE_HPP
    

// ConsoleUnicode.cpp:

        #ifdef _WIN32
            #include <io.h>
            #include <fcntl.h>
        #endif

        #ifdef _WIN32

        void SetStdHandlesToUtf16Mode()
        {
            _setmode(0, _O_U16TEXT);
            _setmode(1, _O_U16TEXT);
            _setmode(2, _O_U16TEXT);
        }

        void SetStdHandlesToNarrowMode()
        {
            _setmode(0, _O_TEXT);
            _setmode(1, _O_TEXT);
            _setmode(2, _O_TEXT);
        }

        bool IsHandleRedirected(int handle)
        {
            return !_isatty(handle);
        }

        struct UnicodeWriteGuard
        {
            UnicodeWriteGuard()
            {
                SetStdHandlesToUtf16Mode();
            }
            ~UnicodeWriteGuard()
            {
                SetStdHandlesToNarrowMode();
            }
        };

        void WriteUtf16StrToStdOutOrStdErr(const std::u16string& str, FILE* file)
        {
        //  precondition: file must be stdout or stderr
            if (file != stdout && file != stderr)
            {
                throw std::runtime_error("WriteUtf16StrToStdOutOrStdErr: precondition violation: file must be stdout or stderr");
            }
            UnicodeWriteGuard unicodeWriteGuard;
            size_t result = std::fwrite(str.c_str(), sizeof(char16_t), str.length(), file);
            if (result != str.length())
            {
                throw std::runtime_error("could not write Unicode text");
            }
        }

        std::ostream& operator<<(std::ostream& s, const std::u32stringm& utf32Str)
        {
            if (&s == &std::cout && !IsHandleRedirected(1))
            {
                std::u16string utf16Str = soulng::unicode::ToUtf16(utf32Str);
                WriteUtf16StrToStdOutOrStdErr(utf16Str, stdout);
                return s;
            }
            else if (&s == &std::cerr && !IsHandleRedirected(2))
            {
                std::u16string utf16Str = soulng::unicode::ToUtf16(utf32Str);
                WriteUtf16StrToStdOutOrStdErr(utf16Str, stderr);
                return s;
            }
            else
            {
                return s << soulng::unicode::ToUtf8(utf32Str);
            }
        }

        #else // !_WIN32

        std::ostream& operator<<(std::ostream& s, const std::u32string& utf32Str)
        {
            return s << soulng::unicode::ToUtf8(utf32Str);
        }

        #endif
    

Now the strings can be printed to the std::cout and std::cerr as UTF-32 strings and the operator<<() function on Windows converts the argument to UTF-16, when printing to console.

The TestMinilangLexer function must be changed to *not* call the ToUtf8 for the match variable. The match can now be printed as UTF-32 string:

        void TestMinilangLexer(const std::string& minilangFilePath)
        {
            std::cout << "> " << minilangFilePath << std::endl;
            std::string s = soulng::util::ReadFile(minilangFilePath);
            std::u32string content = soulng::unicode::ToUtf32(s);
            MinilangLexer lexer(content, minilangFilePath, 0);
            ++lexer;
            while (*lexer != MinilangTokens::END)
            {
                std::u32string match = lexer.token.match.ToString();
                std::cout << MinilangTokens::GetTokenName(*lexer) << "(" << match << ")" << std::endl;
                ++lexer;
            }
            std::cout << "end of file '" << minilangFilePath << "' reached" << std::endl;
        }
    

The main function must also be changed to print the error message having possibly non-ASCII characters as UTF-32:

        // ...

        catch (const std::exception& ex)
        {
            std::cerr << soulng::unicode::ToUtf32(ex.what()) << std::endl;
            return 1;
        }
    

Having these changes implemented, the output looks now right also on Windows console:

        > C:/soulng-1.0.0/examples/minilang/test/unicodeid.minilang
        INT(int)
        ID(örkki)
        LPAREN(()
        RPAREN())
        LBRACE({)
        INT(int)
        ID(öljyä)
        ASSIGN(=)
        INTLIT(1)
        SEMICOLON(;)
        RETURN(return)
        ID(öljyä)
        SEMICOLON(;)
        RBRACE(})
        end of file 'C:/soulng-1.0.0/examples/minilang/test/unicodeid.minilang' reached
    

Running the Third Test

Now we test with an erroneous invalid.minilang file:

        @
    

Set the Configuration Properties / Debugging / Command Arguments to the value --lexer-test test\invalid.minilang and run the program:

We get the following error:

        > C:/soulng-1.0.0/examples/minilang/test/invalid.minilang
        soulng::lexer::Lexer::NextToken(): error: invalid character '@' in file 'C:/soulng-1.0.0/examples/minilang/test/invalid.minilang' at line 1
    

The lexer has no rule for the '@' character, so it generates the preceding error.

Running the Fourth Test

This is the greatest common divisor algorithm:

        int gcd(int a, int b)
        {
            while (b != 0)
            {
                a = a % b;
                int t = a;
                a = b;
                b = t;
            }
            return a;
        }
    

Here's the output:

        > C:/soulng-1.0.0/examples/minilang/test/gcd.minilang
        INT(int)
        ID(gcd)
        LPAREN(()
        INT(int)
        ID(a)
        COMMA(,)
        INT(int)
        ID(b)
        RPAREN())
        LBRACE({)
        WHILE(while)
        LPAREN(()
        ID(b)
        NEQ(!=)
        INTLIT(0)
        RPAREN())
        LBRACE({)
        ID(a)
        ASSIGN(=)
        ID(a)
        MOD(%)
        ID(b)
        SEMICOLON(;)
        INT(int)
        ID(t)
        ASSIGN(=)
        ID(a)
        SEMICOLON(;)
        ID(a)
        ASSIGN(=)
        ID(b)
        SEMICOLON(;)
        ID(b)
        ASSIGN(=)
        ID(t)
        SEMICOLON(;)
        RBRACE(})
        RETURN(return)
        ID(a)
        SEMICOLON(;)
        RBRACE(})
        end of file 'C:/soulng-1.0.0/examples/minilang/test/gcd.minilang' reached
    

Next we will show how to write parsers that can utilize the lexer...

up: Table of contents | prev: Building the Lexer | next: Writing Parsers