Cmajor source files are ordinary UTF-8-encoded plain text files. Each source file consists lexically of keywords, identifiers, literals and operators that can be separated by comments and white space characters such as spaces, tabulations, and newline characters.
Other lexical elements such as keywords, identifiers, literals and operators may be separated by any number of white space characters and comments that match the syntax rule white‑space‑and‑comments.
white‑space‑and‑comments | → | (white‑space‑char | comment)* |
white‑space‑char | → | 'any Unicode character having property WSpace=Y' |
comment | → | line‑comment | block‑comment |
line‑comment | → | // [^\r\n]* newline |
newline | → | \r\n | \n | \r |
block‑comment | → | /* (any‑char − */)* */ |
any‑char | → | 'any Unicode character' |
See Wikipedia article about Unicode Whitespace Characters.
Keywords have a context-dependent meaning in programs. They cannot be used as identifiers.
keyword | → | abstract | and | as | axiom | base | bool | break | byte | case | cast | catch | cdecl | char | class | concept | const | constexpr | construct | continue | default | delegate | delete | destroy | do | double | else | enum | explicit | extern | false | float | for | goto | if | inline | int | interface | internal | is | long | namespace | new | not | nothrow | null | operator | or | override | private | protected | public | return | sbyte | short | sizeof | static | suppress | switch | this | throw | true | try | typedef | typename | uchar | uint | ulong | ushort | using | virtual | void | wchar | where | while |
Identifiers are used to name variables, parameters, types, constants, namespaces, typedefs and functions.
A qualified‑id can be used to refer to an entity when a simple identifier of it would be ambiguous in its context. It consists of the names of the namespaces containing the entity (if any) separated by periods followed by the names of the types (if any) containing the entity separated by periods followed by the identifier of the entity itself. An enumeration constant must always be referred by prefixing the name of it by the name of the enumerated type that contains it and a period. A local variable cannot be referred by a fully qualified identifier. An identifier of a local variable refers always to the local variable in the innermost scope that contains it.
identifier | → | id‑char‑sequence − keyword |
id‑char‑sequence | → | idstart idcont* |
idstart | → | 'any Unicode character having property ID_Start' |
idcont | → | 'any Unicode character having property ID_Continue' |
qualified‑id | → | identifier (. identifier)* |
Information about Unicode identifier syntax can be found in article UNICODE IDENTIFIER AND PATTERN SYNTAX.
Literals are used to enter values in a program. Literals have value and type.
literal | → | boolean‑literal | floating‑literal | integer‑literal | char‑literal | string‑literal | null‑literal | array‑literal | class‑literal |
boolean‑literal | → | true | false |
floating‑literal | → | (fractional‑floating‑literal | exponent‑floating‑literal) [fF]? |
fractional‑floating‑literal | → |
dec‑digit‑sequence?
.
dec‑digit‑sequence
exponent‑part?
dec‑digit‑sequence . |
dec‑digit‑sequence | → | [0-9]+ |
exponent‑floating‑literal | → | dec‑digit‑sequence exponent‑part |
exponent‑part | → | [eE] sign? dec‑digit‑sequence |
sign | → | + | - |
integer‑literal | → | (hex‑integer‑literal | dec‑integer‑literal) [uU]? |
hex‑integer‑literal | → | (0x | 0X) hex‑digit‑sequence |
hex‑digit‑sequence | → | [0-9a-fA-F]+ |
dec‑integer‑literal | → | dec‑digit‑sequence |
char‑literal | → | (w | u)? ' ([^'\\\r\n]+ | char‑escape) ' |
char‑escape | → | \ ([xX] hex‑digit‑sequence | [dD] dec‑digit‑sequence | octal‑digit‑sequence | u hex‑digit‑4 | U hex‑digit‑8 | [abfnrtv] | any‑char) |
octal‑digit‑sequence | → | [0-7]+ |
hex‑digit‑4 | → | hex‑digit hex‑digit hex‑digit hex‑digit |
hex‑digit‑8 | → | hex‑digit hex‑digit hex‑digit hex‑digit hex‑digit hex‑digit hex‑digit hex‑digit |
hex‑digit | → | [0-9a-fA-F] |
string‑literal | → | raw‑string‑literal | regular‑string‑literal |
raw‑string‑literal | → | (w | u)? @ " [^"]* " |
regular‑string‑literal | → | (w | u)? " ([^"\\\r\n]+ | char‑escape)* " |
null‑literal | → | null |
array‑literal | → | [ (constant‑expression (, constant‑expression)* )? ] |
class‑literal | → | { (constant‑expression (, constant‑expression)* )? } |
A Boolean literal can have value true or false. The type of Boolean literals is bool.
A floating literal represents a fractional or exponential floating-point number. If it has 'f' or 'F' suffix its type is float, otherwise its type is double.
An integer literal represents a hexadecimal or decimal signed or unsigned integer. If it has u or U suffix, it represents an unsigned integer literal, and its type is the smallest of the following types that can contain its value: byte, ushort, uint, ulong (see basic integer types). Otherwise it represents a signed integer literal, and its type is the smallest of the following types that can contain its value: sbyte, short, int, long (see basic integer types). If the literal has prefix 0x or 0X, it represents a hexadecimal, or base 16, value, otherwise it represents a decimal, or base 10, value.
A character literal represents an ASCII or a Unicode character value. Graphical character values can be entered by enclosing the character in single quotes. An escape mechanism is provided for entering character values that do not have a graphical representation. If the character literal is has a w prefix, its type is wchar, if it has a u prefix its type is uchar, otherwise its type is char.
By prefixing character value with the backslash \ character, the ASCII, or Unicode code point of the character can be given in hexadecimal (x, X, u or U prefix), decimal d or D prefix), or octal notation (lack of prefix). Some special control characters can be also entered using character combinations \a, \b, \f, \n, \r, \t and \v. Their meaning can be found in the Wikipedia article for ASCII.
A string literal represents an ASCII, a Unicode UTF-8, a Unicode UTF-16, or a Unicode UTF-32 encoded string. The string is entered by enclosing its value in double quotes. If the string literal is prefixed with @ character the content may have no escapes, in other words, the backslash character \ has its literal meaning, otherwise the backslash character provides an escape mechanism for entering non-graphical character values in the same way as described in section for character literals.
If the string literal has a w prefix, its type is const wchar*, and it represents a Unicode UTF-16 encoded string. If the string literal has a u prefix, its type is const uchar*, and it represents a Unicode UTF-32 encoded string. If the string literal has no w or u prefix, its type is const char*, and it represents an ASCII or a Unicode UTF-8 encoded string. Note: by convention Cmajor source files have UTF-8 encoding, so that string literals are always entered using UTF-8 encoding, but internal representation of a string in a program can be ASCII, UTF-8, UTF-16 or UTF-32 encoded string.
The null literal represents a special value of a pointer that does not point to any memory location. Its type is a special @nullptr_type that is implicitly convertible to any other pointer type.
An array literal represents a value of a constant array. Elements of a constant array must be constant expressions that evaluate to literals, constants or enumeration constants.
A class literal represents a value of literal class. Members of a literal class must be constant expressions that evaluate to literals, constants or enumeration constants.
Operators allow expressions to be written with a notation close to mathematical notation.
operator | → | . | [ | ] | < | > | , | = | <=> | => | || | && | | | ^ | & | == | != | <= | >= | < | > | << | >> | + | − | * | / | % | ++ | −− | ! | ~ | −> | ( | ) |
Programming language constructs such as variables, parameters, constants and literal values have a type. A type provides an intepretation of the contents of such a construct and specifies what are the possible values for those constructs. A language has a small number of predefined built-in types also called basic types or primitive types. The Cmajor language defines the following basic types for operating with truth values, numbers and characters.
basic‑type | → | bool | sbyte | byte | short | ushort | int | uint | long | ulong | float | double | char | wchar | uchar | void |
The bool type represents a truth value. It has values true and false.
The sbyte type is a signed 8-bit integer type. It has values -128...127.
The byte type is an unsigned 8-bit integer type. It has values 0u...255u.
The short type is a signed 16-bit integer type. It has values -32768...32767.
The ushort type is an unsigned 16-bit integer type. It has values 0u...65535u.
The int type is a signed 32-bit integer type. It has values –2147483648...2147483647.
The uint type is an unsigned 32-bit integer type. It has values 0u...4294967295u.
The long type is a signed 64-bit integer type. It has values –9223372036854775808...9223372036854775807.
The ulong type is an unsigned 64-bit integer type. It has values 0u...18446744073709551615u.
The float type is a 32-bit single precision floating-point number type.
The double type is a 64-bit double precision floating-point number type.
The char type is an unsigned 8-bit character type. It can have an ASCII code value.
The wchar type is an unsigned 16-bit character type. It can have a Unicode UTF-16 code point value.
The uchar type is an unsigned 32-bit character type. It can have a Unicode UTF-32 code point value.
The void keyword represents lack of value.
Most expressions consist of operators and operands. Some expressions can also contain keywords. Operands can be names of constants, variables, parameters, types, namespaces and functions. They can also be literal values or subexpressions. Expressions can be evaluated, or have their value computed. 1
Many operators can be overloaded. An overloaded operator has the same name as some built-in operator, but takes at least one parameter that is of a user-defined type.
Expressions can be classified as being infix expressions where operator is between the operands, prefix expressions where operator comes before the operand and postfix expressions where operator comes after the operand.
expression | → | equivalence |
equivalence | → | implication (<=> implication)* |
implication | → | disjunction (=> disjunction)? |
disjunction | → | conjunction (|| conjunction)* |
conjunction | → | bit‑or (&& bit‑or)* |
bit‑or | → | bit‑xor (| bit‑xor)* |
bit‑xor | → | bit‑and (^ bit‑and)* |
bit‑and | → | equality (& equality)* |
equality | → | relational ((== | (!=) relational)* |
relational | → |
shift
((<= | >= |
< | >)
shift)* | shift is type-expr | shift as type-expr |
shift | → | additive ((<< | >>) additive )* |
additive | → | multiplicative ((+ | −) multiplicative )* |
multiplicative | → | prefix ((* | / | %) prefix )* |
prefix | → | (++ | −− | + | − | ! | ~ | * | &) prefix | postfix |
postfix | → | primary (++ | −− | . identifier | −> identifier | [ expression ] | ( argument‑list ))* |
primary | → | ( expression ) | literal | basic‑type | template‑id | identifier | this | base | size‑of‑expr | type‑name‑expr | cast‑expr | construct‑expr | new‑expr |
size‑of‑expr | → | sizeof ( expression ) |
type‑name‑expr | → | typename ( expression ) |
cast‑expr | → | cast < type-expr > ( expression ) |
construct‑expr | → | construct < type-expr > ( expression‑list ) |
new‑expr | → | new type-expr ( argument‑list ) |
argument‑list | → | expression‑list? |
expression‑list | → | expression (, expression)* |
constant‑expression | → | expression |
An expression that is a true equivalence expression (and not just a disjunction for example) can only be used in axioms. The same applies to implication expression. An expression used in an axiom, for example a != b <=> !(a == b) is not evaluated at all, it has purely informative value.
A disjunction expression takes bool type operands and yields a bool type result. A disjunction, for example a || b, is true, if either a or b, or both evaluate to true. It is false otherwise. Disjunctive expressions are evaluated using so called short-circuit evaluation: if the left operand is true, the right operand is not evaluated, because the result is already known to be true.
A conjunction expression takes bool type operands and yields a bool type result. A conjunction, for example a && b, is true, if both a and b evaluate to true. It is false otherwise. Conjunctive expressions are also evaluated using short-circuit evaluation: if the left operand is false, the right operand is not evaluated, because the result is already known to be false.
The || and && operators cannot be overloaded.
Bitwise expressions, bit-or, bit-xor and bit-and expressions, take integer type operands, and yield integer type result.
A bit-or expression, for example a | b, is evaluated as follows: for each bit xi of a and corresponding bit yi of b, if both bits are 0, the corresponding bit zi of the result is 0. Otherwise, if either xi, or yi is 1, the result bit zi is 1.
A bit-xor expression, for example a ^ b, is evaluated as follows: for each bit xi of a and corresponding bit yi of b, if either xi or yi, but not both, is 1, the corresponding bit zi of the result is 1. Otherwise, if both xi and yi are both 0 or both are 1, the result bit zi is 0.
A bit-and expression, for example a & b, is evaluated as follows: for each bit xi of a and corresponding bit yi of b, if both bits are 1, the corresponding bit zi of the result is 1. Otherwise, if either xi, or yi is 0, the result bit zi is 0.
Bitwise operators |, ^, and & can be overloaded, in which case they take operands of user-defined types and yield a result of some user-defined or built-in type.
An equality expression takes basic type (other that void), or pointer type operands and yields a bool type result.
Expression a == b evaluates to true if the value of a is equal to the value of b, and false otherwise.
Expression a != b evaluates to true if the value of a is not equal to the value of b, and false otherwise.
Equality operator == can be overloaded, in which case it takes operands of user-defined types and yields a result of some user-defined or built-in type. Inequality operator != cannot be overloaded. Instead, if equality operator is overloaded for some type T, and a and b are expressions of type T, expression a != b is equivalent to an expression !(a == b).
An relational expression takes basic type (other that bool or void), or pointer type operands and yields a bool type result.
Expression a < b evaluates to true if value of a is less than value of b, and false otherwise.
Expression a > b evaluates to true if value of a is greater than value of b, and false otherwise.
Expression a <= b evaluates to true if value of a is less than or equal to value of b, and false otherwise.
Expression a >= b evaluates to true if value of a is greater than or equal to value of b, and false otherwise.
If operands of <, >, <= or >= operators are of character types char, wchar or uchar, the comparison operators compare codepoint values, that is, numeric character code values of the operands.
If operands of <, >, <= or >= operators are pointers, the comparison operators compare the memory address values of the operands.
Less-than operator < can be overloaded, in which case it takes operands of user-defined types and yields a result of some user-defined or built-in type. Other relational operators >, <= and >= cannot be overloaded. Instead, if less-than operator is overloaded for some type T, and a and b are expressions of type T, expression a > b is equivalent to an expression b < a, expression a <= b is equivalent to an expression !(b < a), and expression a >= b is equivalent to an expression !(a < b).
If p is a pointer to an object of some polymorphic class type, and T is some polymorphic class type, expression
p is T*tests whether pointer p actually points to an object of type T or of type U that derives from type T. The test yields a bool result.
If p is a pointer to object of some polymorphic class type, and T is some polymorphic class type, expression
p as T*tests whether pointer p actually points to an object of type T or of type U that derives from type T. If the test is successful, the result is a pointer to class T, otherwise the result is null.
The implementation of this feature is described here.
The is and as operators cannot be overloaded.
A shift expression take integer type operands, and yield integer type result.
Expression a << b returns operand a shifted operand b bit positions left. The vacant rightmost bit positions of the result are filled with zero bits.
Expression a >> b returns operand a shifted operand b bit positions right. The vacant leftmost bit positions of the result are filled with bits depending of the common type of a and b: If the common type of a and b is an unsigned type byte, ushort, uint, or ulong, the leftmost bit positios of the result are zero-filled. Otherwise, if the common type of a and b is a signed type sbyte, short, int or long, the leftmost bit positions of the result are filled with a bit equal to the leftmost bit of a.
Shift operators << and >> can be overloaded, in which case they take operands of user-defined types and yield a result of some user-defined or built-in type. For example, in the system library the << operator has these overloads.
An additive expression takes integer, floating-point, or pointer type operands. The result is of the common type of the operand types.
When a and b are of integer or floating-point types, the expression a + b returns the sum of a and b. and expression a − b returns the difference of a and b. If at least one operands is of a floating-point type, the result is of a floating-point type, otherwise it is of an integer type.
When p is of a pointer type and i is of an integer type, the value of expression p + i is (informally) a pointer pointing i objects "after" p. The same applies to the expression i + p. The value of expression p − i is (informally) a pointer pointing i objects "before" p. When p and q are of pointer types, the value of expression p − q is (informally) the "number of objects" between p and q.
Additive operators + and − can be overloaded, in which case they take operands of user-defined types and yield a result of some user-defined or built-in type. For example, in the system library the + operator has these overloads.
A multiplicative expression takes integer or floating-point type operands. The result is of the common type of the operand types.
Expression a * b returns the product of a multiplied by b.
Expression a / b returns the quotient of a divided by b. If a and b are of integer types, the result is of an integer type. It is truncated to the nearest whole integer towards zero.
Expression a % b returns the remainder of integer division a divided by b. Remainder operation is defined only for integer type operands. If a is negative, the result is negative, otherwise the result is nonnegative.
Multiplicative operators *, / and % can be overloaded, in which case they take operands of user-defined types and yield a result of some user-defined or built-in type.
Prefix expressions have many forms. There are expressions for incrementing or decrementing a variable, returning operand negated, returning logical not of the operand, returning a bitwise complement of the operand, dereferencing a pointer, and taking address of a variable.
All prefix operators ++, −−, +, −, !, ~, * and & can be overloaded, in which case they take an operand of some user-defined type and yield a result of some user-defined or built-in type.
Prefix increment and decrement expressions take integer or pointer type variable operands and yield result of the same type.
If a is an integer variable, expression ++a increments a by one. The value of the expression is the value of a after incrementing it.
If a is an integer variable, expression −−a decrements a by one. The value of the expression is the value of a after decrementing it.
If p is a pointer to an object of type T, expression ++p increments p, so that p will point to the next object of type T in memory. The value of the expression is the value of p after incrementing it.
If p is a pointer to an object of type T, expression −−p decrements p, so that p will point to the previous object of type T in memory. The value of the expression is the value of p after decrementing it.
Unary plus and unary minus expressions take integer or floating-point type operands.
Expression +a will yield value of a and expression −a will yield value of a negated.
Logical not expression takes a bool type operand and yields a bool type result.
If value of a is true, value of !a is false, and if value of a is false, value of !a is true.
Bitwise complement expression takes an integer operand and yields an integer type result.
For each bit xi of an integer value a, the value of ~a is computed as follows: If xi is 0, the corresponding bit yi of the result will be 1, and if xi is 1, the corresponding bit yi of the result will be 0.
Pointer dereference expression takes a pointer type operand and returns the value of pointed-to type.
If p is a pointer to type T object, *p returns the value of that pointed object of type T.
Expression *p can occur also as the left side of an assignment statement, in which case the pointed-to object is assigned a new value.
When pointer dereference expression is overloaded, the return value of the operator function is often of a reference type, so the expression can be used as the left side of an assignment statement.
Address-of expression takes a variable operand and returns a pointer that contains the memory address of that variable.
If a is an object of type T, expression &a yields a pointer to type T that contains the address of a.
A postfix expression can be a postfix increment or decrement expression, a member access expression, a pointer member access expression, a subscript expression, or an invocation expression.
Postfix increment and decrement expressions take integer or pointer type variable operands and yield result of the same type.
If a is an integer variable, expression a++ increments a by one. The value of the expression is the value of a before incrementing it.
If a is an integer variable, expression a−− decrements a by one. The value of the expression is the value of a before decrementing it.
If p is a pointer to an object of type T, expression p++ increments p, so that p will point to the next object of type T in memory. The value of the expression is the value of p before incrementing it.
If p is a pointer to an object of type T, expression p−− decrements p, so that p will point to the previous object of type T in memory. The value of the expression is the value of p before decrementing it.
The postfix increment and decrement operators ++ and −− cannot be overloaded. They are implemented by the compiler if the prefix forms of the operators are overloaded.
Expression a.b accesses member b of namespace, type or class object a. Then b can be, depending on a, a name of a namespace, enumeration constant, constant, type, member variable, typedef, or function. If b has a type, then the type of the expression will be the type of b and the value of the expression will be the value of b.
Member access operator . cannot be overloaded.
Expression a−>b accesses member b of a class object through a pointer (case 1), or through an object of a class that overloads the −> operator (case 2).
The member b accessed can be a member variable or a member function. If b is a member variable, the type of the expression will be the type of b, and the value of expression will be the value of b. If b is a member function, the type of the expression will be the type returned by the member function b and the value of the expression will be the value returned by the member function b (if any).
If a is a pointer to class object T, then b must be a member of the class T, or member of the base class or ancestor class of T.
If a is an object of a class that overloads the −> operator, then that operator function can return either a pointer to a class object (case 1), or return an object of another class that in turn overloads the −> operator (case 2), thus providing another level of indirection.
This indirection mechanism, along with overloading the pointer dereference operator, makes it possible to implement pointer-like classes, such as "smart pointers" and iterators. For example, the system library contains a UniquePtr class having these two overloads.
A subscript expression provides access to individual elements of an array or other sequence of elements.
If a is an array, expression a[i] provides access to i'th array element. By convention the first element has index 0. The type of index i is long.
If p is a pointer to type T object, expression p[i] is equivalent to expression *(p+i), thus providing access to i'th object of type T in a sequence of elements pointed by p. Again indexing starts from 0.
A subscript expression can also occur in the left side of an assignment statement. In that case the accessed element is assigned a new value.
The subscript operator [] can be overloaded, in which case it takes an operand of a user-defined type and yields a result of some user-defined or built-in type.
Invocation expression takes form x(a0, a1, ...), where x can be for example an identifier or a qualified id naming a function, delegate, class delegate, typedef, type or object of a class type, and a0, a1, ... is a possibly empty list of argument expressions separated by commas and enclosed in parentheses.
When x is a name of a function, overload resolution selects the best-matching function overload to be called with the specified arguments. The number of arguments must match exactly to the number of parameters of the function, but if the argument types do not match exactly the signature of the selected function overload, conversions take place. If no single best-matching function can be found, or many overloads are found to be equally good matches, the compiler issues an error.
When x is a name of a delegate or a class delegate, again the number of arguments must match exactly the number of parameters of the delegate or class delegate, and conversions may take place. Delegates and class delegates cannot be overloaded.
When x is a name of a typedef or a type, the expression yields a call of a constructor and creation of a temporary variable of that type. Constructor to be called is selected by overload resolution.
Constructed temporary can be bound to an rvalue reference without the need to call Rvalue function for it.
When x denotes an object of a class type (a variable or a temporary, for instance), and that class type overloads the function call operator (), the expression yields a call to that operator function. Matching function call operator is selected using overload resolution.
A primary expression can be a grouping expression, a literal, a basic type, a template identifier, an identifier, this or base access, a size-of expression, a typename expression, a cast expression, a construct expression, or a new expression.
Parentheses can be used to group subexpressions. Precedence of multiplicative operators is greater than the precedence of additive operators, so a + b * c means a + (b * c). If the addition is ment to be be performed first, this can be accomplished by using parentheses: (a + b) * c.
Literals can be used for example as operands of of arithmetic expressions or as arguments of function calls.
A basic type can be used for example in construction of a temporary or in a size-of expression
A template identifier can be a name a class or a function template along with a template argument list. It can be used to refer to a class template specialization or to call a function template specialization, for instance.
Identifiers are used in expressions to refer to a variable, parameter, type, constant, namespace, typedef or function.
Keyword this refers to the current class object and base refers to the base class object in a member a function context. They can be used for example to call a function of the same (this) or the base class (base) of the current class, or to refer to a member variable of the same (this) or the base class (base) of the current class.
A size-of expression yields the size of the specified object or type in bytes. The type of the result is long. The size of a class object may be bigger than the sum of the sizes of its members, because of alignment.
A typename expression yields the dynamic type name of the specified object or type. The type of the result is const char*. If p is a pointer to class T object, the static type of p is T*, but the actual type of the object p points to can be U, a type derived from T. In this case U is the dynamic type of expression *p and expression typename(*p) returns the fully qualified name of type U.
A cast expression performs explicit type conversion. It takes a target type enclosed in angle brackets and a source expression to be converted enclosed in parentheses. Any basic type excluding void can be explicitly converted to any other basic type excluding void. Any pointer type can be explicitly converted to any other pointer type. You can also cast away constness of an operand as follows. If p is of type const T*, cast<T*>(p) yields a plain pointer. Similarly, if c is of type const T&, cast<T&>(c) yields a plain reference.
A construction expression constructs an object "in place" into a memory location. It takes the type of an object to construct enclosed in angle brackets, and then a list of arguments p, a0, a1, ... enclosed in parentheses, where p is a pointer to the memory location where into to construct the object and a0, a1, ... is a possibly empty list of arguments separated by commas. If T is the type of the object to construct, the type of p can be either void* or T*. The expression yields a T* result.
A new expression creates an object of the specified type. If T is the type of object to create, sizeof(T) bytes of memory is allocated from the free store and then the object is constructed into that memory. The new expression takes the type of an object to construct and a possibly empty list of constructor arguments enclosed in parentheses as operands. It yields a pointer to the newly created object as a result.
A constant expression is an expression that is sufficiently simple so that it can be evaluated at compile time. It can contain literals, constants, enumeration constants, operators that do not involve taking an address of an object and invocations of constexpr functions.
Statements are used in programs to define flow of control, to evaluate expressions for side-effects, to assign values to variables, to construct local variables, to explicitly release allocated resources, to throw and handle exceptions, to test assertions and compile statements conditionally
A labeled statement consists of an identifier that acts as a target label for a goto statement, a colon, and a statement.
The control statements provide basic means for defining the flow of control of a function: sequence, selection and repetition.
A compound statement executes a sequence of statements in order. A compound-statement is a statement so it can be used whenever syntax allows a statement to occur.
A return statement returns control from the currently executing function to the caller of that function. If the function has a return type that is not void, the return statement must have an expression, a return value, that is evaluated and returned to the caller. Otherwise, if the function is a constructor, a destructor, or a member function or nonmember function that has a void return type, the return statement is not allowed to contain a return expression.
An if statement executes a statement conditionally. The if statement contains a bool-valued expression, a condition, that is evaluated. If evaluation of the condition results true, the statement following the condition of the if statement is executed and then control is transferred to the statement that comes after the if statement. Otherwise, if evaluation of the condition results false and the if statement has an else part, the statement following the else keyword is executed. If evaluation of the condition results false and the if statement has not an else part, control is transferred directly to the statement that comes after the if statement.
A while statement executes a statement repeatedly as long as a bool-valued expression, the condition of the while statement, evaluates to true. If the condition evaluates true, control is transferred to the statement following the condition of the while statement. Then control is transferred back to evaluating the condition, and so on, until the condition evaluates false. Then control is transferred to the statement that comes after the while statement. The statement contained by the while statement is executed zero or more times.
A do statement executes a statement repeatedly until a bool-valued expression, the condition of the do statement, evaluates to false. First the statement following the do keyword is executed. Then the condition of the do statement is evaluated. If the condition evaluates true, control is transferred back to the statement following do keyword, and so on, until the condition evaluates false. Then control is transferred to the statement that comes after the do statement. The statement contained by the do statement is executed at least once.
A for statement consists of a for-init-statement, a bool-valued expression, a for-loop-expression and a statement.
The for statement is commonly used as follows: the for-init-statement constructs a loop variable, the expression evaluates some condition that depends on the loop variable. As long as the condition evaluate to true, the statement contained by the for statement is executed, the for-loop-expression that manipulates the loop variable is executed, and control is transferred back to testing the condition.
For example, here's a for loop that prints integer 0 ... 9 to the standard output stream:
for (int i = 0; i < 10; ++i)
{
Console.WriteLine(i);
}
A range-for statement executes a statement for each element of a container. The range-for statement consists of a type and name of a local variable that is bound to each element of a sequence of elements in turn. When the element is bound to the local variable, the statement contained by the range-for statement is executed.
Here's an example program with various range-for statements.
The container of the range-for statement must refer to an object of a class that contains types or typedefs named Iterator and ConstIterator and member functions Begin(), End(), CBegin() and CEnd() that return those iterator types: Begin(), End() must return an Iterator, and CBegin() and CEnd() must return a ConstIterator. The iterator types, Iterator and ConstIterator, must support two operations: a dereference operator overload for accessing an element of a sequence, and an increment operator overload for incrementing the iterator. These requirements are fulfilled by arrays and all container and string classes in the system library. When these requirements are fulfilled by some user-defined class, it also supports the range-for statement.
If the container refers to a const object, the range-for statement
for (T x : c) stmt;is lowered by the compiler to the following sequence of statements:
If the container refers to a non-const object, the range-for statement
for (T x : c) stmt;is lowered by the compiler to the following sequence of statements:
A break statement terminates a while, do or for loop by transferring control to the statement after the looping statement. It is used also to terminate a case or default statement.
A continue statement transfers control from inside of a while or do statement to the condition of the statement, and from inside of a for statement to the for-loop-expression of the for statement.
A goto statement transfers control to a labeled statement.
A switch statement consists of an condition expression that evaluates to an integral value, a nonnegative number of case statements and possibly a default statement. The condition of the switch statement is evaluated and control is transferred to a case statement with a matching value. If none of the case values match, and the switch statement contains a default statement, control is transferred to the default statement. Otherwise, if none of the case values match, but the switch statement does not contain a default statement, control is transferred to the statement coming after the switch statement.
A case statement has a number case values that must be integral compile-time constants and a possibly empty sequence of statements that are executed if the condition of the switch statement match one of the case values. The case statement must be terminated by a break, goto case, goto default, return or throw statement.
A default statement has a possibly empty sequence of statement that are executed if the condition of the switch statement does not match any of the case values. The default statement must be terminated by a break, goto case, return or throw statement.
A goto case statement is used to transfer control from a case, or default statement to a case statement.
A goto default statement is used to transfer control from a case statement to a default statement.
An expression statement evaluates an expression but the possible result of the evaluation is not used. Instead the expression is evaluated for its side-effects. The evaluated expression can for example increment or decrement a variable, or call a function, delegate or class delegate.
An empty statement does nothing. It can be used when the syntax requires a statement but there's nothing to be done.
An assignment statement evaluates the expression that is on the right-hand side of the assignment operator and assigns it to an lvalue expression that is on the left-hand side of the assignment operator. An lvalue expression can be for instance a variable, a dereferenced pointer or iterator, or a reference-valued result of a function call.
A construction statement declares a local variable into its scope, allocates memory for it from the stack frame of the current function2 and sets an initial value for it. The local variable will have given type and name. It is initialized to the value given on the right side of the assignment operator symbol or resulting of a call to a constructor with arguments enclosed in parentheses. If no initializer is given, the variable will be default initialized.
If p is a pointer to type T object that is created using the new expression, statement delete p calls the destructor of T (if T is a class type having one), and then frees the memory allocated for the object back to the free store. It is not an error to call delete for a null pointer.
If p is a pointer to type T object, statement destroy p calls the destructor of T, but does not free any memory. It is not an error to call destroy for a type T that has no destructor. In that case the statement has no effect.
A throw statement throws or rethrows an exception. If the throw statement contains an expression, it must be an expression of a class type that is equal to or derived from the System.Exception class. If the throw statement does not contain an expression, it can be used only in an exception handler block to rethrow the current exception being handled.
Throwing an exception transfers control to a matching exception handler, if any. If no matching exception handler is found, the error message and stack trace of the exception is written to the standard error stream of the process and the execution of the process is ended with exit code 1.
Exceptions should be thrown by value, and catched by reference (or reference to const), as in C++.
For example, if AlphaException is a class derived from the System.Exception class, statement
throw AlphaException("error");
will throw an exception of type AlphaException with message "error".
A try statement executes a block of code, and handles one or more classes of exceptions that may be thrown from that block of code. The exception classes must be equal to or derived from the System.Exception class.
Exceptions should be thrown by value, and catched by reference (or reference to const), as in C++.
For example, if AlphaException is a class derived from the System.Exception class, the statement
try
{
Write();
}
catch (const AlphaException& ex)
{
Console.WriteLine(ex.Message());
}
will handle an exception thrown from the Write() function call, if that exception is of class AlphaException or of a class derived from the AlphaException class.
An assert statement can be used to test bool-valued conditions that should always hold in a valid program. If an assertion expression evaluates to false, an error message "assertion failed" with a function name, source file name and line number along with a possibly incomplete call stack is printed to the standard error stream of the process, and the execution of the process is ended with exit code 254. Assert statements have no effect in a program compiled using the release configuration.
A conditional compilation statement includes statements to be compiled conditionally depending on symbols supplied from the command line or from the IDE.
The #if part contains a conditional compilation expression that is evaluated. If the expression evaluates to true the statements following the expression are included in the compilation. Otherwise an expression contained by one of the #elif parts is evaluated. As soon as one of them evaluates to true, the statements following that expression are included in the compilation. Otherwise, if neither #if or any #elif part expressions evaluates to true, and the statement has an #else part, the statements in the #else part are included in the compilation.
A conditional compilation expression can be a
Access specifiers are used to grant or reject access to a program entity from another program entity.
access‑specifier | → | public | protected | private | internal |
If no access specifier is given, the default access for a namespace-level entity is internal access, and for a class-level entity it is private access.
Type expressions are used for declaring the type of a variable, parameter or constant, or the return type of a function.
type‑expr | → | prefix‑type‑expr |
prefix‑type‑expr | → | const postfix‑type‑expr | postfix‑type‑expr |
postfix‑type‑expr | → | primary‑type‑expr ( member | pointer | rvalue‑ref | lvalue‑ref | array )* |
member | → | . identifier |
pointer | → | * |
rvalue‑ref | → | && |
lvalue‑ref | → | & |
array | → | [ constant‑expression? ] |
primary‑type‑expr | → | basic‑type | template‑id | identifier |
template‑id | → | qualified‑id < type‑expr ( , type‑expr )* > |
A primary type expression is a name of a basic or user-defined type, or a name of a class template with type arguments.
A const qualifier indicates an intent that an object is not ment to be changed. For example, type expression const T& declares a reference to constant T type, and const T* declares a pointer to constant T type.
A member access in a type expression is used for accesssing a member of a namespace or a class type. For example, if N is a namespace that contains type T, type expression N.T * declares a pointer to type T in namespace N type.
A pointer qualifier is used for declaring pointer types. The value of a pointer object is the memory address of the object it points to or it can be a special null value that means it does not point to any object. The default value of a pointer type object is null.
A simple pointed-to object is accessed using the dereference expression. A member of a class object is accessed using the pointer member access expression.
The void* type represents a generic point type. Its value is a memory address or null value just like the value of other pointer types but it does not have information about the type of an object it points to, it is a bare memory address. Other pointer types have an implicit conversion to the generic pointer type and the generic pointer type can be explicitly casted to any other pointer type. The default value of a generic pointer type object is null.
To support systems programming, a void* value can be explicitly converted to a ulong value, and a ulong value can be explicitly converted to a void* value. Also a void* value can be explicitly converted to any delegate type value, and any delegate type value can be explicitly converted to a void* value.
The lvalue and rvalue reference qualifiers are used for declaring reference types. A reference always refers to some other object, it has no default value and it cannot be null. A reference is bound to the object it refers to when it is created and it always refers to the same object for the whole lifetime of it. When a reference is used as the left operand of an assignment, a new value is assigned to the object the reference refers to. When it is used as an operand in an expression, the value of the object the reference refers to is retrieved. In that sense it can be throught of as being an "automatically dereferenced pointer".
Rvalue reference types are used for implementing move semantics.
An array qualifier is used for declaring an array type. An array is a sequence of objects of the same type. If that type is a pointer type the objects pointed to can actually be of different types though. An expression inside the square brackets is the length of the array. It is optional for constant arrays for which the compiler can calculate the length of the array from its initializing expression. If the length of the constant array is specified it must match the length of the array initializer.
If a denotes an array object, the length of it can be obtained using a.Length() notation. The type of the array length is long.
Multidimensional arrays can be declarated as arrays of arrays, for example, int[3][3] x, defines x as a two-dimensional array of nine integers.
Elements of an array are accessed using the subscript expression.
A constant represents either a simple value of a basic or enumerated type, or an array or structure of values that the compiler must be able to evaluate at compile time.
constant | → | access‑specifier? const constant‑type constant‑name = constant‑value ; |
constant‑type | → | type‑expr |
constant‑name | → | identifier |
constant‑value | → | constant‑expression |
The value of a constant must be a constant‑expression that means it must be sufficiently simple so that the compiler can evaluate it at compile time. It cannot use free store so it cannot contain a new expression for example. However, it can contain invocations of constexpr functions with constant arguments. Some algorithms in the system library are declared constexpr.
public class LiteralClass
{
public constexpr LiteralClass(int x_, double y_) : x(x_), y(y_)
{
}
public constexpr int X() const
{
return x;
}
public constexpr double Y() const
{
return y;
}
private int x;
private double y;
}
public const int meaningOfLife = 42;
public const double pi = 3.1415926;
public const int[] numberArray = [1, 2, 3];
public const long lengthOfNumberArray = numberArray.Length();
public const LiteralClass literalClass = {1 + 2 * 3, Min(pi, 4.13)};
public const double y = literalClass.Y();
A global variable represents a namespace-level variable that has a type and a name and possibly an access specifier and an initializer. A global variable has a modifiable value of its type.
global‑variable | → |
access‑specifier?
global‑variable‑type
global‑variable‑name (= global‑variable‑initializer)? ; |
global‑variable‑type | → | type‑expr |
global‑variable‑name | → | identifier |
global‑variable‑initializer | → | constant‑expression |
A global variable may have an access specifier. If it has no access specifier, the default access is private. The private access means that the variable is accessible only from within the same compile unit it is defined. There can be many privately accessible global variables defined in different compile units with the same name. If the access is public or internal the variable with the given name must be unique within the program.
A global variable may have an initializer. The initializer must be a constant expression. If a global variable has no initializer, it will be default-initialized (a.k.a zero-initialized). Global variables support only static (compile-time) initialization. If you need a variable with dynamic initialization, consider using a class with a static constructor that intializes a static member variable of that class. Static constructors provide dynamic thread-safe one-time initialization.
An enumerated type defines a user-defined type that contains a list of enumeration constants.
enumerated‑type | → |
access‑specifier?
enum
enumerated‑type‑name
underlying‑type? { enumeration‑constants } |
enumerated‑type‑name | → | identifier |
underlying‑type | → | : type‑expr |
enumeration‑constants | → | enumeration‑constant (, enumeration‑constant)* |
enumeration‑constant | → | enumeration‑constant‑name ( = enumeration‑constant‑value )? |
enumeration‑constant‑name | → | identifier |
enumeration‑constant‑value | → | constant‑expression |
Enumerated type has an underlying type that must be, if specified, a basic integer type. If the underlying type is not specified, int is used as the underlying type.
There's an implicit conversion from the enumerated type to the underlying type, and an explicit conversion (cast) from the underlying type to the enumerated type. For an example, see NextMonth function and setting permissions in the examples.
The value of an enumeration constant, if specified, must be a constant expression that evaluates to a value that is convertible to the underlying type of the enumerated type. If the value of the first enumeration constant is not specified, it has value zero. If the value of any other enumeration constant is not specified, it has the value of the preceding enumeration constant plus one.
Enumeration constants are accessed with EnumType.enumConstant syntax.
public enum TrafficLight
{
green, yellow, red
}
public enum Month : sbyte
{
january = 1, february, march, april, may, june, july, august, september, october, november, december
}
public enum Permission : byte
{
read = 1u << 0u;
write = 1u << 1u;
execute = 1u << 2u;
}
public Month NextMonth(Month month)
{
return cast<Month>(month % 12 + 1);
}
void main()
{
TrafficLight light = TrafficLight.green;
Month month = Month.january;
Month next = NextMonth(month);
Permission permissions = cast<Permission>(Permission.read | Permission.write);
}
A function represents computation that can be invoked at run time and in simple cases also at compile time. The computation is defined by the body of the function. The body is a compound statement that contains statements that define the control flow of the function. Invocation of a function may yield a value, or the function can be a void function that is invoked for its side-effects.
function | → |
attributes? function‑specifiers return‑type function‑group‑id template‑parameter‑list? parameters where‑constraint? (compound‑statement | ;) |
function‑specifiers | → | (access‑specifier | constexpr | inline | cdecl | extern | nothrow | throw)* |
return‑type | → | type‑expr |
function‑group‑id | → | identifier | operator‑function‑group‑id |
operator‑function‑group‑id | → | operator (<< | >> | == | = | < | −> | ++ | −− | + | − | * | / | % | & | | | ^ | ! | ~ | [] | ()) |
template‑parameter‑list | → | < template‑parameter ( , template‑parameter )* > |
template‑parameter | → | identifier (= default‑value )? |
default‑value | → | type‑expr |
parameters | → | ( parameter‑list? ) |
parameter‑list | → | parameter (, parameter)* |
parameter | → | parameter‑type parameter‑name? |
parameter‑type | → | type‑expr |
parameter‑name | → | identifier |
A regular function has no other specifiers than a possible access specifier.
For an example of a regular function, see strlen.cm.
The constexpr specifier enables compile time evaluation of a function when the function arguments are constant expressions. If the evaluation succeeds, the function call is substituted by the compile time constant that is the result of the evaluation. A constexpr function can be used also in cases when the arguments are not constant expressions. In those cases the function is called just like a regular function. This saves from having to write two versions of the same function, a constexpr and a non-constexpr version.
A constexpr function cannot have the following kinds of statements: goto, range-for, delete, destroy, switch, case, default, goto case, goto default, throw, try, or a conditional compilation statement.
Often constexpr specifier is combined with inline and nothrow specifiers because constexpr functions are typically short and do not throw exceptions.
For an example of a constexpr function, see align.cm.
The inline specifier is a hint to the compiler that the function should be inlined when the program is compiled using release configuration. When a function is inlined, a function call is substituted by the body of the function, so it is best used for short functions. Function inlining avoids a function call overhead and increase other optimizing opportunities so that the generated code can be adapted to the arguments of the function call. Function inlining helps the compiler generating efficient code.
The cdecl specifier disables name mangling. Name mangling makes it possible to have many functions with the same group name but different parameter types. When name mangling is disabled there can be only a single function with the given group name in the whole program and all libraries it uses. The cdecl specifier can ease interoperability with libraries written in other languages.
The nothrow specifier is used for indicating that the function is not supposed to throw any exceptions and it either does not call any function that can throw exceptions, or it handles all exceptions arising from the functions it calls. The nothrow specifier helps the compiler generating efficient code.
The throw specifier can be used to emphasize the fact that a function may throw exceptions. It has no additional semantics compared to a function without the throw specifier and it has purely informational value.
When the function definition has a template parameter list and possibly a constraint, it defines a function template. The compiler instantiates, or creates a specialized version from the function template for each different set of argument types. When a function is instantiated, the template parameters of the function template are substituted with concrete types and the function is compiled using those substituted types. The concrete types can be specified in the function call by using a template-id, or if not specified, they can be automatically deduced by the compiler from the argument types of the function call.
Function templates participate in overload resolution. When two or more function templates have the same group name and number of parameters but different constraints they are overloaded based on those constraints. The compiler checks the constraints in the function call site by substituting the type parameters in the contraints by the concrete types in the function call and accepts or rejects overloads according to whether the constraints are satisfied or not. If two or more function templates satisfy the constraints, the compiler selects the one having most strict constraints.
For example, see these two Next function templates in the system library. If a Next function is called with a type that satisfies both the ForwardIterator concept and the RandomAccessIterator concept, the random access function overload is called because RandomAccessIterator refines BidirectionalIterator concept that refines ForwardIterator concept.
For an example of a function template, see min.cm.
The System.Meta is a special namespace that contains intrinsic functions. Intrinsic functions can be called just like regular functions, but they are special in the sense that they are not compiled from source code as regular functions are, but they are implemented internally inside of the compiler. The intrinsic functions in the System.Meta namespace take a type parameter and return information about the type supplied. They can be used in the predicate constraint expressions.
A function marked with a system_default attribute is called a system-default function. The function overload resolution as described in the next section, is done in two phases: In the first phase the compiler ignores functions marked with the system_default attribute. If the set of viable functions is not empty, a best-matching function is chosen as described in the next section. Only if the set of viable functions is empty, the compiler proceeds to the second phase. In the second phase the compiler includes also functions marked with the system_default attribute as viable functions, and selects the best-matching function as described in the next section. This feature is used in the system library to provide default implementations of output operators for container types. Those default implementations can be overridden by user-defined output operators.
Overload resolution is done in three stages:
Then the set of viable functions is filtered to form a set of overload candidates.
If the viable function is not a function template, it becomes an overload candidate if there exists a valid sequence of conversions from each argument type to the corresponding parameter type of the viable function. The sequence of needed conversions is memorized for the final stage.
If the viable function is a function template, the argument types are bound to the template parameters of the function template. In this case the viable function can be rejected because arguments cannot be bound or there is no valid conversion sequence for a bound parameter.
If the viable function is a constrained function template, after successfully binding the argument types, the constraint is checked by substituting parameter types in the constraint with bound argument types. In this case the viable function can be rejected because the constraint is not satisfied. Otherwise the viable function survives to be an overload candidate and its constraint is memorized for the final stage.
Actually the overload resolution can be issued maximum of three times with different argument combinations to resolve a single function call, because member functions have an implicit this parameter that can be bound to the receiver or to the current this parameter if the call is issued from inside a member function.
The overload candidates are sorted according to the ordering rules that are informally described as follows: When comparing two overload candidates a and b:
First compare the argument conversions to a and b: For all arguments, count the number of better argument conversion to a compared to b. If the number of better argument conversions to a is greater than to b, a is better than b. The better argument conversion means that no conversion is better than a conversion, and when both have a conversion, a smaller conversion distance is better than greater distance.
Then, if a is not a function template and b is a function template, a is better than b. This is to ensure we don't have to instantiate the same function template for the same argument types many times.
Then, if a is not a function template specialization and b is a function template specialization, a is better than b.
Then, if a is a constrained function template and b is an unconstrained function template, a is better than b.
Finally, if a and b is are constrained function templates and the constraint of a is more strict than the constraint of b, a is better than b.
A typedef introduces an alias name for a type expression.
typedef | → | access‑specifier? typedef type‑expr identifier ; |
public typedef String<char> string;
public typedef String<wchar> wstring;
public typedef String<uchar> ustring;
A class definition defines a user-defined type. It may contain member functions, member variables, types, typedefs and constants. It may have a base class and it can implement any number of interfaces.
A regular class may have each kind of class member. It may have a base class and can implement interfaces.
An abstract class is a class that cannot be created an instance of. Typically it is a base class of a class hierarchy, or an intermediate class that derives from a base class but is not a concrete class because it has or inherits an abstract member function. An abstract class may have each kind of class member. Especially it may have abstract member functions but does not have to. If a class has or inherits an abstract member function it must explicitly be declared abstract.
A static class can have only
For an example of a static class, see console.cm.
A static constructor is special member function that is used for initializing static member variables of the class it belongs to. The name of the static constructor must be the name of the class that contains it.
A static constructor is typically used in classes that need one-time initialization. An example of such a class is a singleton. Each static constructor is guarded by a recursive mutex and a Boolean flag provided by the Cmajor runtime. They ensure together that initialization is thread-safe and happens exactly once. A static constructor gets executed before control arrives to the body of a constructor or a static member function of the same class, or before a static member variable of a class is accessed from outside of the class.
For an example of a singleton class that has a static constructor take a look at phonebook.cm. The singleton is accessed using the static Instance() member function. Before Instance() returns a reference to a static instance member variable, the static constructor gets executed if it has not already been executed. The static constructor creates an instance of the PhoneBook class that assigns it to a static instance member variable. Creating an instance of the PhoneBook class involves calling the default constructor of the class that in turn calls the static constructor. But this time the initialization flag provided by the runtime has already been set so control returns from the static constructor right away.
In the start of the static constructor the language implementation checks whether the initialization flag is set. If it has been set the control returns and the body of the static constructor is not executed. Otherwise the language implementation next locks the recursive mutex, and then checks again whether the the initialization flag is set and unlocks the mutex and returns if it is set. This is called the double checked locking pattern. Otherwise the initialization has not yet been done, so the language implementation sets the initialization flag to true, executes the body of the static constructor, unlocks the mutex and returns.
Like a destructor, a static constructor should not throw any exceptions. Although the implementation ensures that the recursive mutex gets unlocked regardless whether control leaves the body of the static constructor normally or by throwing an exception, things won't probably work, because critical data has not been fully initialized. When it comes to exceptions, the static constructor should handle them all gracefully, or give up and exit the program.
If the class having a static constructor has a member variable that has a nontrivial destructor, the class should implement a destructor. If the destructor does not have other things to do besides calling destructors of the member variables, this can be done by using a default destructor:
public class MyClass
{
// ...
public default ~MyClass();
}
A constructor is a special member function that creates an object of the class it belongs to. The task of a constructor is to initialize member variables and allocate resources. Memory is also considered as a kind of resource. The name of the constructor must be the name of the class that contains it.
Member variables and a possible base class object can be initialized by using an initializer list. An initializer list consists of initializers separated by commas. There are three kinds of initializers:
There are three kinds of special constructors. A default constructor, a copy constructor and a move constructor.
The default constructor has signature
ClassName()The purpose of it is to default initialize an object of the class it belongs to. If a class has no user-defined default constructor, the compiler will generate one if it is needed. The generated default constructor will call the default constructor of the possible base class and it will default initialize all member variables of the class.
User can request the automatic generation of the default constructor by using the default keyword:
public default ClassName();The default default constructor cannot have a body.
User can suppress the automatic generation of the default constructor by using the suppress keyword:
suppress ClassName();The suppressed default constructor cannot have a body. The compiler generates an error if trying to call a suppressed default constructor.
A class may have at most one default constructor.
The copy constructor has signature
ClassName(const ClassName&)It takes another object of the same class as an lvalue-reference-to-const parameter and typically copies members from that object by using an initializer list. If a class has no user-defined copy constructor, the compiler will generate one if it is needed. The generated copy constructor will copy the value of a possible base class object and the values of all member variables from another object of the same class.
User can request the automatic generation of the copy constructor by using the default keyword:
public default ClassName(const ClassName&);The default copy constructor cannot have a body.
User can suppress the automatic generation of the copy constructor by using the suppress keyword:
suppress ClassName(const ClassName&);The suppressed copy constructor cannot have a body. The compiler generates an error if trying to call a suppressed copy constructor.
A class may have at most one copy constructor.
The move constructor has signature
ClassName(ClassName&&)It takes another object of the same class as a rvalue-reference parameter and typically moves members from that object by using an initializer list or implementing the move in the body of the move constructor.
User can request the automatic generation of the move constructor by using the default keyword:
public default ClassName(ClassName&&);The default move constructor cannot have a body.
User can suppress the automatic generation of the move constructor by using the suppress keyword:
suppress ClassName(ClassName&&);The suppressed move constructor cannot have a body. The compiler generates an error if trying to call a suppressed move constructor.
A class may have at most one move constructor.
Either if the class has a user-defined constructor or a compiler-generated one, the language implementation will complete it with the following actions:
The purpose of the destructor is to release allocated or otherwise obtained resources. Memory is also considered as a kind of resource. The name of the destructor must be the name of the class that contains it.
If a class does not have a user-defined destructor and either the class is polymorphic or it has a member variable that has a nontrivial destructor, the compiler will generate a destructor for the class. If the class has a polymorphic base class, the generated destructor will be set as overridden, otherwise, if the class is polymorphic, it will be set virtual, otherwise it will remain as a regular destructor.
Either if the class has a user-defined destructor or a compiler-generated one, the language implementation will complete it with the following actions:
A destructor may have the following specifiers:
A class may have at most one destructor.
A member function can use nonstatic and static member variables when it performs its job.
There are two kinds of special member functions left: a copy assignment operator and a move assignment operator:
The copy assignment operator has signature
void operator=(const ClassName&)It takes another object of the same class as an lvalue-reference-to-const parameter. If a class has no user-defined copy assignment operator, the compiler will generate one if it is needed. The generated copy assignment operator will assign the base class object and member variables from the passed argument.
User can request the automatic generation of the copy assignment by using the default keyword:
public default void operator=(const ClassName&);The default copy assignment cannot have a body.
User can suppress the automatic generation of the copy assignment by using the suppress keyword:
suppress void operator=(const ClassName&);The suppressed copy assignment cannot have a body. The compiler generates an error if trying to call a suppressed copy assignment.
A class may have at most one copy assignment operator.
The move assignment operator has signature
void operator=(ClassName&&)It takes another object of the same class as a rvalue-reference parameter. If a class has no user-defined move assignment operator, the compiler will generate one if it is needed. The generated move assignment operator will call the move assignment operator of the base class with the base class object of the argument and it will swap all member variables of the current class object with the member variables of the argument.
User can request the automatic generation of the move assignment by using the default keyword:
public default void operator=(ClassName&&);The default move assignment cannot have a body.
User can suppress the automatic generation of the move assignment by using the suppress keyword:
suppress void operator=(ClassName&&);The suppressed move assignment cannot have a body. The compiler generates an error if trying to call a suppressed move assignment.
A member function can have the following specifiers:
A member variable has specified type and name. A member variable may have an access specifier and it may be declared static. Static member variables can be initialized in a static constructor. Nonstatic member variables are also called instance variables. They can be initialized in a constructor.
In Cmajor a class may have a single base class. If class B is the base class of class A we say also that A derives from B, and that A inherits members from B. When A derives from B we may think that A is-a-kind-of B. Inheritance relationship allows us to build hierarchies of classes some of which are base classes and others inheriting from them. If class A derives from class B, A may override some or all the abstract, virtual, or already overridden member functions of B.
For an example of a class hierarchy, see vehicles.cm. It contains a hierarchy of vehicle classes: bicycle and car are vehicles, and truck is a kind of car.
When the class definition has a template parameter list and possibly a constraint, it defines a class template. The compiler instantiates, or creates a specialized version from the class template for each different set of template argument types. An instantiated class template is called a class template specialization. When a class template is instantiated, the template parameters of the class template are substituted by specified concrete types and the declarations of the class are type checked using those substituted types. Those concrete types can be specified by using a template-id. When a member function of a class template specialization is called, the compiler instantiates it using the specified types. The compiler instantiates automatically the following member functions for each class template specialization:
For an example of a class template, see unique_ptr.cm. It's a smart pointer that implements unique ownership.
Sometimes the compiler fails to instantiate all needed member functions of a class template specialization. This happens especially when the class template inherits from an abstract base class and the derived class template overrides some member function(s) of the abstract base class. These instantiation failures manifest themselves as linker errors.
To get rid of the linker errors, the programmer may ask the compiler to instantate all member functions of a class template by using a full instantiation request. The full instantiation request reuses keywords new and class. It should be placed inside a namespace scope or to the global namespace. The syntax is as follows:
new class template-id;
The primary class template and the template arguments may be either user-defined classes or classes defined in the system library.
For example having a class template System.Counter that has an overridden Dispose member function. If the compiler fails to instantiate the Dispose function for a specialization System.Counter<MyClass>, the programmer may place a full instantiation request inside some namespace:
namespace MyNamespace { new class System.Counter<MyClass>; }
An interface is a list of member function signatures that describe some behaviour. A class can implement any number of interfaces. When a class implements an interface it must provide implementation for the member functions contained by the interface.
interface | → |
attributes? access‑specifier? interface interface‑name { interface‑content } |
interface‑name | → | identifier |
interface‑content | → | interface‑member‑function* |
interface‑member‑function | → | return‑type interface‑member‑function‑name parameters ; |
interface‑member‑function‑name | → | identifier |
A delegate type represents a function signature type.
delegate | → | delegate‑specifiers delegate return‑type delegate‑name parameters ; |
delegate‑specifiers | → | (access‑specifier | nothrow | throw)* |
delegate‑name | → | identifier |
An object of a delegate type can be bound to a nonmember function or a static member function
by assigning a name of that function to the delegate type object.
If dlg
is an object of a delegate type, the function that is
currently bound to the dlg
object can be called using syntax dlg(arg1, arg2, ...)
.
For an example of a delegate, see delegate.cm.
To support systems programming, a delegate type value can be explicitly converted to a void* value, and a void* value can be explicitly converted to a delegate type value.
A class delegate type represents a member function signature type.
class‑delegate | → | class‑delegate‑specifiers class delegate return‑type class‑delegate‑name parameters ; |
class‑delegate‑specifiers | → | (access‑specifier | nothrow | throw)* |
class‑delegate‑name | → | identifier |
An object of a class delegate type can be bound to a specific member function of a specific class object.
If clsdlg
is an object of a class delegate type, the member function that is
currently bound to the clsdlg
object can be called using syntax clsdlg(arg1, arg2, ...)
.
For an example of a class delegate, see class_delegate.cm.
A concept is a named collection of requirements for a type or for a group of types. Those requirements are called constraints. Constraints can be thought of as Boolean expressions, or predicates, that operate on properties of types.
Concepts are checked as part of overload resolution. When a concept is checked, the type parameters it contains are substituted with argument types of the function call and then the constraint expressions in the body of the concept are evaluated using those substituted types. If all the results of these evaluations are true, we say that the type or group of types satisfy the concept, or conform to the concept. Those overloads whose constraint expressions yield true result, form the set of overload candidates.
A concept may refine another concept. Overload resolution selects always an overload whose constraint expression is most strict, or contains most refined concepts that are satisfied with argument types of the function call.
A concept may also contain axioms. Axioms are not checked or evaluated in any way by the compiler but they express semantic facts about properties of types that should always hold when the substituted type or types conform to the concept containing the axiom. Axioms are ment to be as information for the programmer.
A constraint in the body of a constraint can be a typename constraint, a signature constraint, or an embedded constraint.
Evaluation of a typename constraint yields true, if the substituted type contains a typedef or type whose name is equal to the identifier contained by the constraint, false otherwise.
For examples of typename constraints, see Container concept in the system library. It contains three typename constraints: T.ValueType, T.Iterator and T.ConstIterator. This means that if a type T satisfies Container concept it must contain three typedefs named ValueType, Iterator and ConstIterator.
Evaluation of a signature constraint yields true, if there exists a constructor, destructor, member function or function whose signature matches given constructor, destructor, member function or function signature respectively, and false otherwise. Destructor constraint is satisfied always, because a trivial destructor matches a destructor signature.
Here's an example of a concept that contains a constructor constraint:
public concept MoveConstructible<T>
{
T(T&&);
}
Here's an example of a concept that contains a destructor constraint:
public concept Destructible<T>
{
~T();
}
Here's an example of a concept that contains a member function constraint:
public concept Container<T>
{
// ...
long T.Count();
}
Here's an example of a concept that contains a function constraint:
public concept LessThanComparable<T>
{
bool operator<(T, T);
// ...
}
An embedded constraint is a where constraint that is embedded in the concept body. An embedded constraint yields true, if the where-constraint yields true, and false otherwise.
A where constraint consists of the keyword where and a constraint expression. A where-constraint yields true, if the constraint expression yields true, and false otherwise.
A constraint expression can be a disjunctive constraint expression, a conjunctive constraint expression, a primary constraint expression, an atomic constraint expression, a predicate constraint expression, an is-constraint expression, or a multiparam-constraint expression.
A disjunctive constraint expression is a sequence of conjunctive constraint expressions separated by the keyword or.
A disjunctive constraint expression
a or byields true, if either a or b, or both yield true, and false otherwise.
A conjunctive constraint expression is a sequence of primary constraint expressions separated by the keyword and.
A conjunctive constraint expression
a and byields true, if both a and b yield true.
A primary constraint expression is a parenthesized constraint expression, or an atomic constraint expression.
A primary constraint expression yields true if either the parenthesized constrained expression or the atomic constraint expression yields true, and false otherwise.
An atomic constraint expression is either a predicate constraint expression, an is-constraint expression, or a multiparam-constraint expression. It yields true if either predicate constraint expression, is-constraint expression or multiparam-constraint expression respectively yields true, and false otherwise.
A predicate constraint is invocation of a Boolean-valued constexpr or intrinsic function. It yields true if evaluation of the corresponding function yields true, and false otherwise.
An is-constraint expression is either of the form
type is typeor of the form
type is conceptThe first form yields true, if the type on the left-hand side is equal to the type on the right-hand side when possible reference and const qualifiers are removed, and false otherwise.
The second form yields true, if the type on the left-hand side conforms to the concept on the right-hand side, and false otherwise. The conformance of the concept is checked by substituting the type on the left side to the single type parameter of the concept on the right side and evaluating the constraint expressions in the body of the concept. If all the constraint expressions of the concept yield true, the type conforms to the concept, otherwise it does not conform to the concept.
A multiparam-constraint expression is of the form
concept<type1, type2, ..., typen>It yields true, if the types type1, type2, ..., typen conform to the n-parametric concept concept, and false otherwise. The conformance of the concept is checked by substituting type parameters of the concept with types type1, type2, etc. respectively and evaluating the constraint expressions in the body of the concept. If all the constraint expressions of the concept yield true, the types conform to the concept, otherwise they do not conform to the concept.
An axiom consists of a possible identifier, a possibly empty list of parameters and a body enclosed in braces. A body of an axiom is a possibly empty sequence of axiom statements. Each axiom statement is a Boolean-valued expression such as an equivalence, an implication, a disjunction, a conjunction, an equality or a relational expression. If a type conforms to a concept that contains these axioms, these axioms should always be true for such a type.
Here's an example of a concept that contains axioms:
public concept LessThanComparable<T>
{
// ...
axiom irreflexive(T a) { !(a < a); }
axiom antisymmetric(T a, T b) { a < b => !(b < a); }
axiom transitive(T a, T b, T c) { a < b && b < c => a < c; }
axiom total(T a, T b) { a < b || a == b || a > b; }
axiom greaterThan(T a, T b) { a > b <=> b < a; }
axiom greaterThanOrEqualTo(T a, T b) { a >= b <=> !(a < b); }
axiom lessThanOrEqualTo(T a, T b) { a <= b <=> !(b < a); }
}
Functions, classes, member functions, member variables and interfaces can have attributes.
Attributes are name-value pairs declared between brackets. The attribute declaration precedes syntactically the associated function, class, variable or interface.
attributes | → | [ ( attribute ( , attribute )* )? ] |
attribute | → | attribute‑name ( = attribute‑value )? |
attribute‑name | → | id‑char‑sequence |
attribute‑value | → | " ( [^"\\\r\n] | char‑escape )* " |
Attributes are name-value pairs that can be attached to programming constructs. Attribute name is an identifier recognized by the compiler or a compiling tool and attribute value is a string. If the value of an attribute is not explicitly given, it will have implicitly value "true". Attributes can be used for example to generate serialization code or for a similar task.
Currently Cmajor compiler recognizes four attributes:
Cmajor projects consist of source files that are also called compile units. Each compile unit contains a possibly empty sequence of using directives followed by a possibly empty sequence of definitions.
There are two kinds of using directives: using-alias directives and using-namespace directives.
A using-alias directive introduces an alternate name, a simple identifier, for a namespace-level entity referred by its fully qualified name.
For example, using-alias directive
using Console = System.Console;makes it possible to refer to System.Console class as bare Console instead of its fully qualified name System.Console, for example.
A using-namespace directive makes contents of a namespace available in a compile unit using simple identifiers of namespace-level entities.
For example, using-namespace directive
using System;makes contents of System namespace available in the current compile unit. This means that one can refer to System.Console class as bare Console instead of its fully qualified name System.Console, for example.
However, if there are two entities of the same name, Foo, for example, in two namespaces, Alpha and Beta, and both contents of Alpha and Beta namespace are made available with using-namespace directives:
using Alpha;one must refer to Foo using its fully qualified name, Alpha.Foo, for example.
using Beta;
Definitions can appear inside a namespace, or at the global namespace level, outside any other namespace.
A namespace definition consists of a keyword namespace followed by the name of the namespace followed by its contents. The name of a namespace can be a simple identifier, Alpha, for example, or a fully qualified identifier, Alpha.Beta.Gamma, for example. Then
namespace Alpha.Beta.Gamma { ... }is a shorthand notation for
namespace Alpha { namespace Beta { namespace Gamma { ... } } }
Namespaces can be used to organize entities in a library under a common name. They can also be used to prevent name clashes: If both a graphics library Graphics contains a function named draw that draws a shape, and a lottery library Lottery contains also a function named draw that shows a lottery draw, they could be used in the same program if the entities of the graphics library were defined in the namespace Graphics and the entities of the lottery library in the namespace Lottery:
namespace Graphics { void draw() { ... } }
namespace Lottery { void draw() { ... } }
void main
{
Graphics.draw();
Lottery.draw();
}
Namespaces are open: many compile units can add definitions to the same namespace.
A namespace, including the global namespace, can contain the following:
A namespace is unnamed if it lacks an identifier. Contents of an unnamed namespace is available only in the same source file that it appears. The names inside an unnamed namespace do not collide with identical names in other unnamed namespaces. The mangled name of an unnamed namespace will be "unnamed_ns_UNIQUE_HEX_STRING" where UNIQUE_HEX_STRING will be SHA-1 hash of a random UUID. The mangled names of the entities belonging to an unnamed namespace will be unique.
There is one project for each Cmajor program and library. By convention each project should be in its own directory and have a project file that ends with .cmp extension. A project file contains the name and type of the project and lists the source files belonging to it. A project can reference another project. Reference dependencies must be acyclic.
project‑file | → | project project‑name ; project‑declarations |
project‑name | → | qualified‑id |
project‑declarations | → | project‑declaration* |
project‑declaration | → | reference‑declaration | source‑file‑declaration | resource‑file‑declaration | text‑file‑declaration | target‑declaration |
reference‑declaration | → | reference file‑path ; |
source‑file‑declaration | → | source file‑path ; |
resource‑file‑declaration | → | resource file‑path ; |
text‑file‑declaration | → | text file‑path ; |
target‑declaration | → | target = target ; |
target | → | program | winguiapp | winapp | library | winlib | unitTest |
file‑path | → | < [^>]+ > |
The target declaration defines the type of the project:
The winguiapp, winapp and winlib project types are available only on Windows platform.
The project file can contain declarations for the following file types:
Here's a project file alpha.cmp for a project named Alpha that references two library projects Beta and Gamma:
project Alpha;
target=program;
reference <../beta/beta.cmp>;
reference <../gamma/gamma.cmp>;
source <main.cm>;
Here's the project file beta.cmp:
project Beta;
target=library;
source <beta.cm>;
and here's the project file gamma.cmp:
project Gamma;The directory structure is as follows:
target=library;
source <gamma.cm>;
| +--alpha | | | +--alpha.cmp | | | +--main.cm | +--beta | | | +--beta.cmp | | | +--beta.cm | +--gamma | +--gamma.cmp | +--gamma.cm
Each valid Cmajor program must have a main function where the execution of the program starts. The possible signatures of the main function are:
void main();
int main();
void main(int argc, const char** argv);
int main(int argc, const char** argv);
If the return type of the main function is void, exit code 0 is returned to the caller of the program at the end of a normal program execution.
If the return type of the main function is int, the main function must explicitly return a value that is then returned to the caller of the program at the end of a normal program execution.
In the last two signatures argc is the number of program arguments including the name of the program, and argv contains the name of the program and program arguments. By convention argv[0] is the name of the program and argv[1], argv[2], ..., argv[argc − 1] are program arguments.
A solution is a group of related projects that can be built as a unit. By convention a solution has a solution file that ends with .cms extension. When deciding the build order of the projects in a solution, the Cmajor compiler does a topological sort of the projects using reference relationship as the sorting criteria. If project A references a project B, project B is built before project A.
solution‑file | → | solution solution‑name ; solution‑declarations |
solution‑name | → | qualified‑id |
solution‑declarations | → | solution‑declaration* |
solution‑declaration | → | solution‑project‑declaration | active‑project‑declaration |
solution‑project‑declaration | → | project file‑path ; |
active‑project‑declaration | → | activeProject qualified‑id ; |
Here's a solution file solution.cms that contains three projects Alpha, Beta and Gamma:
solution Solution;The directory structure is as follows:
project <alpha/alpha.cmp>;
project <beta/beta.cmp>;
project <gamma/gamma.cmp>;
+--solution | +--solution.cms | +--alpha | | | +--alpha.cmp | | | +--main.cm | +--beta | | | +--beta.cmp | | | +--beta.cm | +--gamma | +--gamma.cmp | +--gamma.cm
Size of a class c may be aligned to a, or rounded to the smallest multiple of the a that is equal to or greater than the size of c before the alignment has taken place.
For example, given class Foo
class Foosizeof(Foo) is 8, and not 5 because of alignment of 4, that is sizeof(int).
{
int x;
byte y;
}
The number of parameters of a function.
The common type for two basic types is the narrowist type that can contain a value of both types. The following table contains the common type function given two basic types:
bool | sbyte | byte | short | ushort | int | uint | long | ulong | float | double | char | wchar | uchar | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
bool | bool | |||||||||||||
sbyte | sbyte | short | short | int | int | long | long | float | double | |||||
byte | short | byte | short | ushort | int | uint | long | ulong | float | double | ||||
short | short | short | short | int | int | long | long | float | double | |||||
ushort | int | ushort | int | ushort | int | uint | long | ulong | float | double | ||||
int | int | int | int | int | int | long | long | float | double | |||||
uint | long | uint | long | uint | long | uint | long | ulong | float | double | ||||
long | long | long | long | long | long | long | long | float | double | |||||
ulong | ulong | ulong | ulong | ulong | float | double | ||||||||
float | float | float | float | float | float | float | float | float | float | double | ||||
double | double | double | double | double | double | double | double | double | double | double | ||||
char | char | wchar | uchar | |||||||||||
wchar | wchar | wchar | uchar | |||||||||||
uchar | uchar | uchar | uchar |
Opposite of an abstract class. A concrete class may not have an abstract member function and it must override all abstract member functions it inherits.
If A and B are class types, and A is the same class as B or A is derived directly or indirectly from B, their conversion distance is derined as follows: if A = B, distance(A, B) = 0, otherwise distance(A, B) = 1 + distance(baseClassOf(A), B).
If A and B are class types and A have no inheritance relationship or B is derived from A, we define distance(A, B) = 255.
If A and B are pointer-to-class types, their distance is defined as the distance of the pointed-to types. If A or B or both are reference-to-class types, their distance is defined as the distance of types for which references are removed.
If A and B are basic types, and A is implicitly convertible to B we define their conversion distance by associating integers to the types and computing the distance as the difference of those integers. For example, distance(byte, short) = 1, distance(byte, ushort) = 2, distance(byte, int) = 3, etc.
In Cmajor objects of the following kind of types are default initialized as follows:3
A memory area managed by the runtime library and eventually by the operating system. A program can allocate memory from the free store and then release it back when it has finished using it.
The name of the function without the parameter list. For example, these functions have all the same group name foo:
public void foo(int x) {}
public void foo(double y) {}
public class Alpha
{
public void foo() {}
}
The name of the function, the types of the parameters of the function, and possibly the constraint of the function.
For example, function
public void foo(int x) {}
has signature foo(int)
Interface Method Table. Each polymorphic class has one IMT per implemented interface. The interface method table contains pointers to implemented interface member functions for the class. Each interface member function has an IMT index that is used for getting the member function pointer from the IMT when doing interface call dispatch.
An enumeration constant, an integer type value or a character type value.
An expression that can appear on the left-hand side of an assignment. The lvalue expressions are:
An entity defined at namespace level. Entities that can be defined at namespace level are:
A class has a nontrivial destructor if it has
The compiler generates a nontrivial destructor for a class if one or more of the following conditions are true:
A class is polymorphic, if one or more of the following conditions are true:
A polymorphic class type has a VMT.
User-defined types are enumerated types, class types, interface types, and delegate and class delegate types.
Virtual Method Table. There is one virtual method table per polymorphic class. A virtual method table contains:
Each virtual, abstract and overridden member function contains a VMT index that is used for getting the member function pointer from the VMT when doing virtual call dispatch.
Syntax of the Cmajor programming language is described in this text using a kind of context-free grammar called Parsing Expression Grammar or PEG. The notation used is not exaxtly same as PEG notation, but a slightly modified version of it.
In this notation a grammar consists of rules that are of the form:
ruleName | → | ruleBody |
A rule produces a set of strings that forms a language. For example, the rule compile-unit produces a set a strings that form the language of syntactically valid Cmajor source files. The syntax rules alone are not enough to describe what is a valid Cmajor program though, because syntactically valid source file includes meaningless constructs. For example, program
void main()is syntactically valid but meaningless because one cannot assign to a literal. It is compiler's job to perform semantic analysis and detect this kind of errors. In this case the Cmajor compiler produces the following error message:
{
1 = a;
}
not an lvalue expression (file 'C:/Users/Seppo/cmajorw64/cmajor/test/foo/main.cm', line 5): 1 = a; ^
The body of a rule consists of parsing expressions that are combined to produce a pattern that describes a set of strings that forms a language.
One of the simplest kind of parsing expression is a keyword. A keyword is represented using bold font. A keyword is a string that appears literally in the produced language. For example, rule int‑rule produces a language that contains one string, "int":
int‑rule | → | int |
Another simple kind of parsing expression is a terminal string. A terminal string is represented using monospace font. A terminal string also appears literal in the produced language. For example, rule parentheses produces a language that contains a string consisting of a left and a right parenthesis:
parentheses | → | () |
A character class is a parsing expression that produces one character that is in the character class. A character class represented using serif font and it is enclosed in square brackets. For example, rule latin‑letter produces a language that contains strings that consist of a single Latin letter:
latin‑letter | → | [a-zA-Z] |
nondigit | → | [^0-9] |
A nonterminal is a name of a grammar rule. A nonterminal is represented using italic font. A nonterminal produces a language that consists of the strings that the rule it names produces. For example, rule class‑name produces a language that consists of the strings that rule identifier produces:
class‑name | → | identifier |
Sometimes it is not possible to include the syntax using formal rules. In that case the produced strings are described in english. The informal rules are enclosed in apostrophies. For example, rule any‑char produces a language that consists of the strings that consist of single Unicode characters:
any‑char | → | 'any Unicode character' |
The parsing expressions can be combined using grammar operators that are: |, sequence, −, *, +, ?, and ().
If e1 and e2 are parsing expressions, e1 | e2 produces the set of strings that is union of the set of strings that e1 produces and the set of strings that e2 produces. Another way of thinking it is that the bar character (|) operator means alternatives. Unlike in general context-free grammars, in Parsing Expression Grammars the first alternative that matches an input always wins, so Parsing Expression Grammars cannot be ambiguous as general context-free grammars can.
For example, the rule basic‑type produces a language that consists of the names of the Cmajor basic types:
basic‑type | → | bool | sbyte | byte | short | ushort | int | uint | long | ulong | float | double | char | wchar | uchar | void |
If e1 and e2 are parsing expressions, e1e2 produces a set of strings that consist of strings that e1 produces concatenated with strings that e2 produces. Another way of thinking it is that first e1 occurs and then e2 occurs.
For example, rule hex‑digit‑4 produces a language that consist of strings of four hexadecimal digits:
hex‑digit‑4 | → | hex‑digit hex‑digit hex‑digit hex‑digit |
If e1 and e2 are parsing expressions, e1 − e2 produces the set of strings that is difference of the strings that e1 produces and the strings that e2 produces. Another way of thinking it is that e1 occurs but e2 does not occur.
For example, rule identifier produces strings that consist of strings that belong to the set of strings that rule id‑char‑sequence produces but do not belong to the set of strings that rule keyword produces:
identifier | → | id‑char‑sequence − keyword |
id‑char‑sequence | → | (letter | _) (letter | digit | _)* |
letter | → | [a-zA-Z] |
digit | → | [0-9] |
If e is a parsing expression and λ denotes an empty string, e* produces strings that the following parsing expressions produce:
λ, e, ee, eee, ...Another way of thinking it is that e occurs zero or more times. The name of this operation is kleene closure.
For example, rule digits produces strings that consist of zero or more decimal digits:
digits | → | [0-9]* |
If e is a parsing expression, e+ produces strings that the following parsing expressions produce:
e, ee, eee, ...Another way of thinking it is that e occurs one or more times.
For example, rule dec‑digit‑sequence produces strings consisting of nonempty sequences of decimal digits:
dec‑digit‑sequence | → | [0-9]+ |
If e is a parsing expression, e? produces strings that consist of the empty string and the strings that parsing expressions e produce. Another way of thinking it is that e may but does not have to occur, it is optional.
For example, rule integer produces strings that may begin with sign and then comes nonempty sequence of decimal digits:
integer | → | sign? dec‑digit‑sequence |
sign | → | + | - |
dec‑digit‑sequence | → | [0-9]+ |
If e is a parsing expression, (e) produces the same strings that e produce. The parentheses may be used to group parsing expressions another way when the precedence of grammar operators produces wrong result. The precedence of | is lowest, then comes sequence operation, −, and then *, + and ?. This means that for example parsing expression
ab*produces strings that consist of character a followed by zero or more b's, because precedence of sequence operation is lower than the precedence of *, so that operator * binds tighter than sequence operation.
If one wants to produce the following strings: λ, ab, abab, ababab, ..., this can be achieved by using parentheses:
(ab)*
The parentheses used for grouping are slightly taller than parentheses that are terminal characters:
grouping‑parens | → | (()) |