Macros

Macros can be seen as a kind of "code abbreviation" or a way to extend language syntax. During compilation or program running, the abbreviation is replaced with the actual code, and this replacement process is called macro expansion. Functions that can be expressed in unified and simple code can be achieved by using macros. Cangjie provides procedural macros for macro expansion in the syntax analysis phase. In the future, more easy-to-use and expressive macro definition modes will be provided, including late-stage macros and template macros.

Procedural Macros

A Cangjie procedural macro processes and transforms an input token sequence and outputs another token sequence. The input token sequence is generated by the lexical analyzer and must comply with the lexical rules of Cangjie. The output token sequence must meet the syntax and semantics of Cangjie and be a valid Cangjie program. The following example shows the working principle of the procedural macros. In this example, we are calling a DebugLog macro with expensiveComputation() as its parameter. This macro will find out (at compile time) whether the program is configured to run in development mode or in production mode. In the development mode, expensiveComputation() is run and the debugging output is printed to help detect and locate problems. In the production mode, not only do we not want to see the debug output, we also do not want to pay the performance cost of running the expensive computation.

@DebugLog( expensiveComputation() )

The macro DebugLog can be implemented like this:

public macro DebugLog(input: Tokens) {
    if (globalConfig.mode == Mode.development) {
        return quote( println( ${input} ) )
    }
    else {
        return quote()
    }
}

The macro definition syntax of Cangjie is similar to the function definition syntax. The parameter can only be a token sequence (that is, the Tokens type), and the return value is a token sequence obtained after conversion. The return value is the code generated by macro call (that is, macro expansion). In the preceding example, in the development mode, the return value is outside the input token sequence and the println function is called. Therefore, the execution result is printed in addition to the input part. If the development mode is not used, an empty sequence is returned. That is, the input part is ignored and no code is generated.

Late-stage Macros and Template Macros

The following describes two types of macros under development, that is, late-stage macros and template macros, which will be released in later versions of Cangjie.

The input token sequence of the preceding procedural macro does not contain the semantic information of the program. In some cases, users may need to perform corresponding processing according to the variable type or the class and interface declaration information in the macro definition. This capability cannot be implemented through the procedural macros. The following program is used as an example:

@FindType
var x1: Employee = Employee("Fred Johnson")
// getting the type info of `x1`: easy, it is right there

@FindType
var x2 = Employee("Bob Houston")
// getting the type info of `x2`: hard, requires type inference

Assume that the macro FindType is used to obtain the types of the variables x1 and x2 and print or add the types to the log. The type of x1, that is, Employee, is shown in the syntax and can be extracted from the input token sequence. However, the type of x2 is not shown in the declaration and therefore cannot be directly obtained from the input token sequence. The type of x2 needs to be obtained through type inference. However, macro expansion occurs in the syntax analysis phase when type inference has not been performed. Therefore, the type of x2 is unavailable. The late-stage macros can be used to obtain and utilize various semantic information of the program, including the type information, through type inference followed by macro expansion.

The late-stage macros can be used to generate code based on type information and non-local definitions in the code. It is a powerful function that extends the processing capability of macros. Any fundamental change to a piece of code of a known type is impossible. Therefore, the late-stage macros have more limited expression capability.

If a macro can be thought of as a structured rewriting from some source code to some target code, then the template macros are a better choice than the ordinary procedural macros:

public template macro unless {
    template (cond: Expr, block: Block) {
        @unless (cond) block
            =>
        if (! cond) block
    }
}

The preceding template macro definition can be used to write the following program:

@unless (x > 0) {
    print("x not greater than 0")
}

During the macro expansion, the system matches the preceding template according to the template macro definition, extracts cond and block, and converts them into the following:

if (! x > 0) {
    print("x not greater than 0")
}

The strength of template macros comes from the fact that they describe the intended source code and target code directly, putting the focus on the central transformation. A procedural macro could do the same thing, but it would require tedious and perhaps error-prone code to describe the same transformation that a template macro describes directly.