Practical Cases

Fast Power Calculation

Here is a simple example about how to perform evaluation with macros during compilation and generate optimized code. In calculating the power n ^ e, if e is a (large) integer, you can accelerate the calculation by repeated squaring instead of iterative multiplying. This calculation method can be implemented by using the while loop. For example:

func power(n: Int64, e: Int64) {
    var result = 1
    var vn = n
    var ve = e
    while (ve > 0) {
        if (ve % 2 == 1) {
            result *= vn
        }
        ve /= 2
        if (ve > 0) {
            vn *= vn
        }
    }
    result
}

However, the implementation needs to analyze the value of e each time, and judge and update the ve for multiple times in the loop and condition judgment. In addition, the implementation supports only n of the Int64 type. To support n of other types, the problem of how to express result = 1 needs to be solved. If the value of e is known in advance, the code can be written in a simpler way. For example, if the value of e is 10, the entire loop can be expanded as follows:

func power_10(n: Int64) {
    var vn = n
    vn *= vn         // vn = n ^ 2
    var result = vn  // result = n ^ 2
    vn *= vn         // vn = n ^ 4
    vn *= vn         // vn = n ^ 8
    result *= vn     // result = n ^ 10
    result
}

Certainly, it is cumbersome to compile the code manually. Therefore, it is hoped that the code can be automatically generated after the value of e is given. This is where macro comes in. Let's look at a use case.

public func power_10(n: Int64) {
    @power[10](n)
}

According to the .macrocall file, the code expanded by the macro is as follows:

public func power_10(n: Int64) {
    /* ===== Emitted by MacroCall @power in main.cj:20:5 ===== */
    /* 20.1 */var _power_vn = n
    /* 20.2 */_power_vn *= _power_vn
    /* 20.3 */var _power_result = _power_vn
    /* 20.4 */_power_vn *= _power_vn
    /* 20.5 */_power_vn *= _power_vn
    /* 20.6 */_power_result *= _power_vn
    /* 20.7 */_power_result
/* ===== End of the Emit ===== */
}

Next, we will take a look at the implementation of the @power macro.

macro package define

import std.ast.*
import std.convert.*

public macro power(attrib: Tokens, input: Tokens) {
    let attribExpr = parseExpr(attrib)
    if (let Some(litExpr) <- attribExpr as LitConstExpr) {
        let lit = litExpr.literal
        if (lit.kind != TokenKind.INTEGER_LITERAL) {
            diagReport(DiagReportLevel.ERROR, attrib,
                       "Attribute must be integer literal",
                       "Expected integer literal")
        }
        var n = Int64.parse(lit.value)
        var result = quote(var _power_vn = $(input)
        )
        var flag = false
        while (n > 0) {
            if (n % 2 == 1) {
                if (!flag) {
                    result += quote(var _power_result = _power_vn
                    )
                    flag = true
                } else {
                    result += quote(_power_result *= _power_vn
                    )
                }
            }
            n /= 2
            if (n > 0) {
                result += quote(_power_vn *= _power_vn
                )
            }
        }
        result += quote(_power_result)
        return result
    } else {
        diagReport(DiagReportLevel.ERROR, attrib,
                   "Attribute must be integer literal",
                   "Expected integer literal")
    }
    return input
}

Here is the interpretation of the code:

First, the attribute attrib is checked to be an integer literal, otherwise an error is reported with diagReport. The literal is then parsed to an integer n.
result is assumed as the resulting code accumulator and the var _power_vn declaration is added. To avoid variable name conflicts, _power_vn is used as the name.
Then, the while loop starts. The Boolean variable flag indicates whether var _power_result has been initialized. The structure of the remaining code is similar to that of the power function. The difference lies in that the while loop and the if judgment are used to determine the generated code during compilation instead of operations. The final code is generated by properly combining _power_result *= _power_vn with _power_vn *= _power_vn.
Finally, the code for returning _power_result is added.

Save the code to the macros/power.cj file and add the following test code to the main.cj file:

public func power_10(n: Int64) {
    @power[10](n)
}

main() {
    let a = 3
    println(power_10(a))
}

The output is as follows:

Memoize Macro

Memoize is a commonly used method in dynamic programming algorithms. It stores the result of a sub-problem that has been calculated. When the same sub-problem appears again, the result can be directly obtained by querying the table, avoiding repeated calculation and boosts the algorithm efficiency.

Generally, developers need to the storage and extraction functions of Memoize are implemented manually. Macros allow the process to be automatic. The macro effect is as follows:

@Memoize[true]
func fib(n: Int64): Int64 {
    if (n == 0 || n == 1) {
        return n
    }
    return fib(n - 1) + fib(n - 2)
}

main() {
    let start = DateTime.now()
    let f35 = fib(35)
    let end = DateTime.now()
    println("fib(35): ${f35}")
    println("execution time: ${(end - start).toMicroseconds()} us")
}

In the preceding code, the fib function is implemented in a simple recursive mode. If @Memoize[true] is removed, the runtime of this function will grow exponentially with n. For example, if you remove @Memoize[true] from the preceding code or change true to false, the execution result of the main function will become:

fib(35): 9227465
execution time: 199500 us

If the @Memoize[true] is restored. The result will be:

fib(35): 9227465
execution time: 78 us

The same result and drastically reduced computation time indicate that the memorized effect achieved by the use of @Memoize.

To help you understand the principle of @Memoize, the results of the macro expansion of the preceding fib function are displayed, which are obtained from the .macrocall file but formatted for better readability.

import std.collection.*

var memoizeFibMap = HashMap<Int64, Int64>()

func fib(n: Int64): Int64 {
    if (memoizeFibMap.contains(n)) {
        return memoizeFibMap.get(n).getOrThrow()
    }

    let memoizeEvalResult = { =>
        if (n == 0 || n == 1) {
            return n
        }

        return fib(n - 1) + fib(n - 2)
    }()
    memoizeFibMap.put(n, memoizeEvalResult)
    return memoizeEvalResult
}

The execution process of the preceding code is described as follows:

First, memoizeFibMap is defined as a hash table from Int64 to Int64. The first Int64 corresponds to the type of the unique parameter of fib, and the second corresponds to the type of the return value of fib.
Second, the input parameter in the function body is checked. If the parameter is in the memoizeFibMap, the value stored in the hash table is returned immediately. Otherwise, the original fib function body is used to obtain the calculation result. Here, an anonymous function (without parameters) is used to avoid changing the fib function body and allow any way to return from the fib function (in this case, the return in the middle and return of the last expression).
Finally, the calculation result is stored in memoizeFibMap and returned.

Such a "template" makes it easy to understand the implementation of the following macros. The complete code is provided as follows:

public macro Memoize(attrib: Tokens, input: Tokens) {
    if (attrib.size != 1 || attrib[0].kind != TokenKind.BOOL_LITERAL) {
        diagReport(DiagReportLevel.ERROR, attrib,
                   "Attribute must be a boolean literal (true or false)",
                   "Expected boolean literal (true or false) here")
    }

    let memoized = (attrib[0].value == "true")
    if (!memoized) {
        return input
    }

    let fd = FuncDecl(input)
    if (fd.funcParams.size != 1) {
        diagReport(DiagReportLevel.ERROR, fd.lParen + fd.funcParams.toTokens() + fd.rParen,
                   "Input function to memoize should take exactly one argument",
                   "Expect only one argument here")
    }

    let memoMap = Token(TokenKind.IDENTIFIER, "_memoize_" + fd.identifier.value + "_map")
    let arg1 = fd.funcParams[0]

    return quote(
        var $(memoMap) = HashMap<$(arg1.paramType), $(fd.declType)>()

        func $(fd.identifier)($(arg1)): $(fd.declType) {
            if ($(memoMap).contains($(arg1.identifier))) {
                return $(memoMap).get($(arg1.identifier)).getOrThrow()
            }

            let _memoize_eval_result = { => $(fd.block.nodes) }()
            $(memoMap).put($(arg1.identifier), _memoize_eval_result)
            return _memoize_eval_result
        }
    )
}

First, the validity of the attribute and input is checked. The attribute must be a Boolean literal. If the result is false, the input is directly returned. In other cases, the input must be able to be parsed as a function declaration (FuncDecl) and must contain exactly one parameter. After that, the variable of the hash table is generated, with a variable name that avoids conflicts. Finally, the quote template is used to generate the return code, in which the variable name of the hash table, the name and type of the unique parameter, and the return type of the input function are incorporated.

Extension of a Dprint Macro

At the beginning of this section, a macro for printing expressions is used as an example. However, the macro accepts only one expression at a time. Hence, we want to extend the macro to accept multiple expressions separated by commas. The following displays how to realize this function through parseExprFragment.

The macro is implemented as follows:

public macro dprint2(input: Tokens) {
    let exprs = ArrayList<Expr>()
    var index: Int64 = 0
    while (true) {
        let (expr, nextIndex) = parseExprFragment(input, startFrom: index)
        exprs.append(expr)
        if (nextIndex == input.size) {
            break
        }
        if (input[nextIndex].kind != TokenKind.COMMA) {
            diagReport(DiagReportLevel.ERROR, input[nextIndex..nextIndex+1],
                       "Input must be a comma-separated list of expressions",
                       "Expected comma")
        }
        index = nextIndex + 1 // Skip commas
    }
    let result = quote()
    for (expr in exprs) {
        result.append(quote(
            print($(expr.toTokens().toString()) + " = ")
            println($(expr))
        ))
    }
    return result
}

Use case:

let x = 3
let y = 2
@dprint2(x, y, x + y)

The output is as follows:

x = 3
y = 2
x + y = 5

In implementing the macro, the while loop is used to parse each expression successively from index 0. The index variable stores the current parsing location. Each parseExprFragment call starts from current parsing position, and the latest position after parsing, together with the expression obtained, is returned. If the position after parsing reaches the end of the input, the loop exits. In other cases, the loop checks whether the reached position is a comma. If it is not a comma, the loop reports an error and exits. Otherwise, the loop skips the comma and starts the next round of parsing. After the list of expressions is obtained, each expression is output in sequence.

A Simple DSL

This case will demonstrate how to use macros to implement a simple domain specific language (DSL). Language integrated query (LINQ) is a component of the Microsoft .NET framework. It offers a unified data query syntax and allows developers to use SQL-like query statements to operate data sources. Here, we only demonstrate how to support a simplest LINQ syntax.

The syntax to be supported is as follows:

from <variable> in <list> where <condition> select <expression>

Where, variable is an identifier, and list, condition, and expression are all expressions. Therefore, the strategy for implementing a macro is to extract the identifier and expressions and check whether the keywords in the middle are correct. Finally, a query result composed of the extracted parts is generated.

The macro is implemented as follows:

public macro linq(input: Tokens) {
    let syntaxMsg = "Syntax is \"from <attrib> in <table> where <cond> select <expr>\""
    if (input.size == 0 || input[0].value != "from") {
        diagReport(DiagReportLevel.ERROR, input[0..1], syntaxMsg,
                   "Expected keyword \"from\" here.")
    }
    if (input.size <= 1 || input[1].kind != TokenKind.IDENTIFIER) {
        diagReport(DiagReportLevel.ERROR, input[1..2], syntaxMsg,
                   "Expected identifier here.")
    }
    let attribute = input[1]
    if (input.size <= 2 || input[2].value != "in") {
        diagReport(DiagReportLevel.ERROR, input[2..3], syntaxMsg,
                   "Expected keyword \"in\" here.")
    }
    var index: Int64 = 3
    let (table, nextIndex) = parseExprFragment(input, startFrom: index)
    if (nextIndex == input.size || input[nextIndex].value != "where") {
        diagReport(DiagReportLevel.ERROR, input[nextIndex..nextIndex+1], syntaxMsg,
                   "Expected keyword \"where\" here.")
    }
    index = nextIndex + 1 // Skip where
    let (cond, nextIndex2) = parseExprFragment(input, startFrom: index)
    if (nextIndex2 == input.size || input[nextIndex2].value != "select") {
        diagReport(DiagReportLevel.ERROR, input[nextIndex2..nextIndex2+1], syntaxMsg,
                   "Expected keyword \"select\" here.")
    }
    index = nextIndex2 + 1 // Skip select
    let (expr, nextIndex3) = parseExprFragment(input, startFrom: index)

    return quote(
        for ($(attribute) in $(table)) {
            if ($(cond)) {
                println($(expr))
            }
        }
    )
}

Use case:

@linq(from x in 1..=10 where x % 2 == 1 select x * x)

In this example, odd numbers are filtered out from a list of integers from 1 to 10, and the squares of all odd numbers are returned. The output is as follows:

As can be seen, the macro implementation is largely associated with parsing and validating the input tokens, which is crucial for the availability of macros. The syntax of the LINQ language (and most DSLs) in reality is more complex. Therefore, a complete set of parsing mechanisms is required to determine the content to be parsed by identifying different keywords or connectors.