Syntax Nodes
During the compilation of the Cangjie language, the code is converted into Tokens through lexical analysis, and then the syntax of Tokens is parsed to obtain a syntax tree. A node of each syntax tree could be an expression, declaration, type, pattern, among others. The ast library of Cangjie provides a class for each node, with an appropriate inheritance relationship among the classes. The main abstract classes are as follows:
Node: parent class of all syntax nodes.TypeNode: parent class of all type nodes.Expr: parent class of all expression nodes.Decl: parent class of all declaration nodes.Pattern: parent class of all pattern nodes.
For more details about the various node types, see Cangjie Programming Language Library API. In the examples below, the following two nodes are used:
BinaryExpr: binary operation expression.FuncDecl: function declaration.
Node Parsing
With the standard library's std.ast package, almost all types of nodes can be parsed from Tokens. There are two methods to parse Tokens and construct syntax nodes.
Using Functions for Expression and Declaration Parsing
The following functions are used to parse and construct any syntax node from Tokens:
parseExpr(input: Tokens): Expr: parses the inputTokensinto an expression node.parseExprFragment(input: Tokens, startFrom!: Int64 = 0): (Expr, Int64): parses a fragment of the inputTokensinto an expression node. The fragment starts from thestartFromindex. The parsing may consume only a part of the fragment starting from thestartFromindex and return the index of the first unconsumedToken. (If the entire fragment is consumed, the return value isinput.size.)parseDecl(input: Tokens, astKind!: String = ""): parses the inputTokensinto a declaration node.astKindis an additional setting. For details, see Cangjie Programming Language Library API.parseDeclFragment(input: Tokens, startFrom!: Int64 = 0): (Decl, Int64): parses a fragment of the inputTokensinto a declaration node. ThestartFromparameter and the returned index of this function have the same meanings as those ofparseExpr.parseType(input: Tokens): TypeNode: parses the inputTokensinto a type node.parseTypeFragment(input: Tokens, startFrom!: Int64 = 0): (TypeNode, Int64): parses a fragment of the inputTokensinto a type node. ThestartFromparameter and the returned index of this function have the same meanings as those ofparseExpr.parsePattern(input: Tokens): Pattern: parses the inputTokensinto a pattern node.parsePatternFragment(input: Tokens, startFrom!: Int64 = 0): (Pattern, Int64): parses a fragment of the inputTokensinto a pattern node. ThestartFromparameter and the returned index of this function have the same meanings as those ofparseExpr.
If the parsing fails, an exception is thrown. This parsing mode is applicable to code snippets whose types are unknown. If you need to obtain the specific subtype node, you need to manually convert the parsing result to the specific subtype.
The following example shows how to use these functions:
let tks1 = quote(a + b)
let tks2 = quote(u + v, x + y)
let tks3 = quote(
func f1(x: Int64) { return x + 1 }
)
let tks4 = quote(
func f2(x: Int64) { return x + 2 }
func f3(x: Int64) { return x + 3 }
)
let binExpr1 = parseExpr(tks1)
let (binExpr2, mid) = parseExprFragment(tks2)
let (binExpr3, _) = parseExprFragment(tks2, startFrom: mid + 1) // Skip the comma.
println("binExpr1 = ${binExpr1.toTokens()}")
println("binExpr2 = ${binExpr2.toTokens()}, binExpr3 = ${binExpr3.toTokens()}")
let funcDecl1 = parseDecl(tks3)
let (funcDecl2, mid2) = parseDeclFragment(tks4)
let (funcDecl3, _) = parseDeclFragment(tks4, startFrom: mid2)
println("${funcDecl1.toTokens()}")
println("${funcDecl2.toTokens()}")
println("${funcDecl3.toTokens()}")
The output is as follows:
binExpr1 = a + b
binExpr2 = u + v, binExpr3 = x + y
func f1(x: Int64) {
return x + 1
}
func f2(x: Int64) {
return x + 2
}
func f3(x: Int64) {
return x + 3
}
Using the Constructors of a Syntax Node to Parse Tokens
Most syntax nodes support the init(input: Tokens) constructor, which can parse the input Tokens into a node of the specified type.For example:
import std.ast.*
let binExpr = BinaryExpr(quote(a + b))
let funcDecl = FuncDecl(quote(func f1(x: Int64) { return x + 1 }))
If the parsing fails, an exception is thrown. This parsing method applies to code snippets whose types are known, so manual type casting are not needed.
Node Composition
After a node is parsed based on Tokens, you can view the components of the node. The components of BinaryExpr and FuncDecl are listed as examples. For details about the components of other nodes, see Cangjie Programming Language Library API.
BinaryExprnode:leftExpr: Expr: expression on the left of the operatorop: Token: operatorrightExpr: Expr: expression on the right of the operator
FuncDeclnode (partial):identifier: Token: function namefuncParams: ArrayList<FuncParam>: parameter listdeclType: TypeNode: return value typeblock: Block: function body
FuncParamnode (partial):identifier: Token: parameter nameparamType: TypeNode: parameter type
Blocknode (partial):nodes: ArrayList<Node>: expressions and declarations in a block
Each component is a public mut prop and can be viewed and updated. Here are some examples of update results.
Case of BinaryExpr
let binExpr = BinaryExpr(quote(x * y))
binExpr.leftExpr = BinaryExpr(quote(a + b))
println(binExpr.toTokens())
binExpr.op = Token(TokenKind.ADD)
println(binExpr.toTokens())
The output is as follows:
(a + b) * y
a + b + y
To begin with, it is identified through parsing that binExpr represents the node x * y, as depicted in the following figure.
*
/ \
x y
After that, we replace the left node (that is, x) with a + b, which obtains the following syntax tree:
*
/ \
+ y
/ \
a b
When the syntax tree is tokenized (with toTokens), parentheses must be added around a + b to obtain (a + b) * y. (If a + b * y is a result, multiplication is performed before addition, which contradicts the connotation of the syntax tree.) The ast library can automatically add parentheses when the syntax tree is tokenized.
Finally, we replace the operator at the root of the syntax tree from * to + to obtain the following syntax tree:
+
/ \
+ y
/ \
a b
The syntax tree can be tokenized as a + b + y because addition is inherently left-associative and requires no parentheses on the left.
Case of FuncDecl
let funcDecl = FuncDecl(quote(func f1(x: Int64) { x + 1 }))
funcDecl.identifier = Token(TokenKind.IDENTIFIER, "foo")
println("Number of parameters: ${funcDecl.funcParams.size}")
funcDecl.funcParams[0].identifier = Token(TokenKind.IDENTIFIER, "a")
println("Number of nodes in body: ${funcDecl.block.nodes.size}")
let binExpr = (funcDecl.block.nodes[0] as BinaryExpr).getOrThrow()
binExpr.leftExpr = parseExpr(quote(a))
println(funcDecl.toTokens())
In this case, a FuncDecl node is constructed through parsing, and the function name, parameter name, and part of the expression in the function body are modified. The output is as follows:
Number of parameters: 1
Number of nodes in body: 1
func foo(a: Int64) {
a + 1
}
Using the quote Interpolation Syntax Node
Any syntax node can use interpolation in the quote statement, and the ArrayList of some syntax nodes can also be interpolated (corresponding to real-world scenarios where such node lists appear).Interpolation can be directly expressed by $(node), where node is an instance of any node type.
Next, we will demonstrate how to interpolate nodes through some cases.
var binExpr = BinaryExpr(quote(1 + 2))
let a = quote($(binExpr))
let b = quote($binExpr)
let c = quote($(binExpr.leftExpr))
let d = quote($binExpr.leftExpr)
println("a: ${a.toTokens()}")
println("b: ${b.toTokens()}")
println("c: ${c.toTokens()}")
println("d: ${d.toTokens()}")
The output is as follows:
a: 1 + 2
b: 1 + 2
c: 1
d: 1 + 2.leftExpr
Generally, the expression after an interpolation operator uses parentheses to limit the scope, for example, $(binExpr). If there is only one identifier, the expression can be written as $binExpr, with the parentheses omitted. In this case, both a and b insert the binExpr node into the quote statement, and the result is 1 + 2. However, if the expression after the interpolation operator is more complex, a scope error may occur if parentheses are not added. For example, the expression binExpr.leftExpr is evaluated as the left part of expression 1 + 2, that is, 1. Therefore, 1 is correctly assigned to c. However, the interpolation in d is interpreted as ($binExpr).leftExpr, so the result is 1 + 2.leftExpr. To specify the scope of interpolation, you are advised to use parentheses in interpolation operations.
The following example shows the interpolation of a node list (ArrayList).
var incrs = ArrayList<Node>()
for (i in 1..=5) {
incrs.add(parseExpr(quote(x += $(i))))
}
var foo = quote(
func foo(n: Int64) {
let x = n
$(incrs)
x
})
println(foo)
The output is as follows:
func foo(n: Int64) {
let x = n
x += 1
x += 2
x += 3
x += 4
x += 5
x
}
In this case, we create a node list incrs that contains expressions x += 1, ..., x += 5. The interpolation of incrs lists the nodes in sequence and wraps the line for each node. This applies to the scenario where expressions and declarations that need to be executed sequentially are inserted.
The following example shows that in some cases, parentheses need to be added around the interpolation for correctness.
var binExpr1 = BinaryExpr(quote(x + y))
var binExpr2 = BinaryExpr(quote($(binExpr1) * z)) // Error: x + y x z is obtained.
println("binExpr2: ${binExpr2.toTokens()}")
println("binExpr2.leftExpr: ${binExpr2.leftExpr.toTokens()}")
println("binExpr2.rightExpr: ${binExpr2.rightExpr.toTokens()}")
var binExpr3 = BinaryExpr(quote(($(binExpr1)) * z)) / Correct: (x + y) x z is obtained.
println("binExpr3: ${binExpr3.toTokens()}")
The output is as follows:
binExpr2: x + y * z
binExpr2.leftExpr: x
binExpr2.rightExpr: y * z
binExpr3: (x + y) * z
First, we construct the expression x + y, and then insert the expression into the $(binExpr1) * z template. The intent is to obtain an expression that first computes x + y and then multiplies the result by z. However, the result of interpolation is x + y * z, in which y * z is performed before x is added. This is because interpolation cannot automatically add parentheses to ensure the atomicity of the inserted expression, which differs from the replacement of leftExpr introduced in the previous section. Therefore, parentheses need to be added around $(binExpr1) to ensure result correctness.