Syntax Nodes
During the compilation of the Cangjie language, the code is converted into Tokens
through lexical analysis, and then the syntax of Tokens
is parsed to obtain a syntax tree. A node of each syntax tree could be an expression, declaration, type, pattern, among others. The ast
library of Cangjie provides a class for each node, with an appropriate inheritance relationship among the classes. The main abstract classes are as follows:
Node
: parent class of all syntax nodes.TypeNode
: parent class of all type nodes.Expr
: parent class of all expression nodes.Decl
: parent class of all declaration nodes.Pattern
: parent class of all pattern nodes.
For more details about the various node types, see Cangjie Programming Language Library API. In the examples below, the following two nodes are used:
BinaryExpr
: binary operation expression.FuncDecl
: function declaration.
Node Parsing
The ast
library enables basically each type of node to be parsed from Tokens
. There are two ways to parse a node.
Using Functions for Expression and Declaration Parsing
The following functions can be used to parse any expression or declaration from Tokens
:
parseExpr(input: Tokens): Expr
: Parses the inputTokens
into an expression.parseExprFragment(input: Tokens, startFrom!: Int64 = 0): (Expr, Int64)
: Parses a segment of the inputTokens
into an expression. The segment starts from thestartFrom
index. The parsing may consume only a part of the segment starting from thestartFrom
index and return the index of the first unconsumedToken
. If the entire segment is consumed,input.size
is returned.parseDecl(input: Tokens, astKind!: String = "")
: Parses the inputTokens
into a declaration.astKind
is an additional setting. For details, see Cangjie Programming Language Library API.parseDeclFragment(input: Tokens, startFrom!: Int64 = 0): (Decl, Int64)
: Parses a segment of the inputTokens
into a declaration. The meanings of thestartFrom
parameter and return index are the same as those ofparseExpr
.
The following code cases illustrate how to use these functions.
let tks1 = quote(a + b)
let tks2 = quote(u + v, x + y)
let tks3 = quote(
func f1(x: Int64) { return x + 1 }
)
let tks4 = quote(
func f2(x: Int64) { return x + 2 }
func f3(x: Int64) { return x + 3 }
)
let binExpr1 = parseExpr(tks1)
let (binExpr2, mid) = parseExprFragment(tks2)
let (binExpr3, _) = parseExprFragment(tks2, startFrom: mid + 1) // Skip the comma.
println("binExpr1 = ${binExpr1.toTokens()}")
println("binExpr2 = ${binExpr2.toTokens()}, binExpr3 = ${binExpr3.toTokens()}")
let funcDecl1 = parseDecl(tks3)
let (funcDecl2, mid2) = parseDeclFragment(tks4)
let (funcDecl3, _) = parseDeclFragment(tks4, startFrom: mid2)
println("${funcDecl1.toTokens()}")
println("${funcDecl2.toTokens()}")
println("${funcDecl3.toTokens()}")
The output is as follows:
binExpr1 = a + b
binExpr2 = u + v, binExpr3 = x + y
func f1(x: Int64) {
return x + 1
}
func f2(x: Int64) {
return x + 2
}
func f3(x: Int64) {
return x + 3
}
Using Constructors for Parsing
Most node types support the init(input: Tokens)
constructor, which can parse the input Tokens
to the node of the corresponding type. For example:
import std.ast.*
let binExpr = BinaryExpr(quote(a + b))
let funcDecl = FuncDecl(quote(func f1(x: Int64) { return x + 1 }))
If the parsing fails, an exception is thrown. This parsing method applies to code snippets whose types are known, so manual type casting are not needed.
Node Composition
After a node is parsed based on Tokens
, you can view the components of the node. The components of BinaryExpr
and FuncDecl
are listed as examples. For details about the components of other nodes, see Cangjie Programming Language Library API.
BinaryExpr
node:leftExpr: Expr
: expression on the left of the operatorop: Token
: operatorrightExpr: Expr
: expression on the right of the operator
FuncDecl
node (partial):identifier: Token
: function namefuncParams: ArrayList<FuncParam>
: parameter listdeclType: TypeNode
: return value typeblock: Block
: function body
FuncParam
node (partial):identifier: Token
: parameter nameparamType: TypeNode
: parameter type
Block
node (partial):nodes: ArrayList<Node>
: expressions and declarations in a block
Each component is a public mut prop
and can be viewed and updated. Here are some examples of update results.
Case of BinaryExpr
let binExpr = BinaryExpr(quote(x * y))
binExpr.leftExpr = BinaryExpr(quote(a + b))
println(binExpr.toTokens())
binExpr.op = Token(TokenKind.ADD)
println(binExpr.toTokens())
The output is as follows:
(a + b) * y
a + b + y
To begin with, it is identified through parsing that binExpr
represents the node x * y
, as depicted in the following figure.
*
/ \
x y
After that, we replace the left node (that is, x
) with a + b
, which obtains the following syntax tree:
*
/ \
+ y
/ \
a b
When the syntax tree is tokenized (with toTokens
), parentheses must be added around a + b
to obtain (a + b) * y
. (If a + b * y
is a result, multiplication is performed before addition, which contradicts the connotation of the syntax tree.) The ast
library can automatically add parentheses when the syntax tree is tokenized.
Finally, we replace the operator at the root of the syntax tree from *
to +
to obtain the following syntax tree:
+
/ \
+ y
/ \
a b
The syntax tree can be tokenized as a + b + y
because addition is inherently left-associative and requires no parentheses on the left.
Case of FuncDecl
let funcDecl = FuncDecl(quote(func f1(x: Int64) { x + 1 }))
funcDecl.identifier = Token(TokenKind.IDENTIFIER, "foo")
println("Number of parameters: ${funcDecl.funcParams.size}")
funcDecl.funcParams[0].identifier = Token(TokenKind.IDENTIFIER, "a")
println("Number of nodes in body: ${funcDecl.block.nodes.size}")
let binExpr = (funcDecl.block.nodes[0] as BinaryExpr).getOrThrow()
binExpr.leftExpr = parseExpr(quote(a))
println(funcDecl.toTokens())
In this case, a FuncDecl
node is constructed through parsing, and the function name, parameter name, and part of the expression in the function body are modified. The output is as follows:
Number of parameters: 1
Number of nodes in body: 1
func foo(a: Int64) {
a + 1
}
Using the quote Interpolation Syntax Node
Any node of an abstract syntax tree (AST) can be interpolated in the quote
statement, and the ArrayList
lists of some AST nodes can also be interpolated (corresponding to scenarios where such node lists appear in practice). Interpolation can be directly expressed by $(node)
, where node
is an instance of any node type.
Next, we will demonstrate how to interpolate nodes through some cases.
var binExpr = BinaryExpr(quote(1 + 2))
let a = quote($(binExpr))
let b = quote($binExpr)
let c = quote($(binExpr.leftExpr))
let d = quote($binExpr.leftExpr)
println("a: ${a.toTokens()}")
println("b: ${b.toTokens()}")
println("c: ${c.toTokens()}")
println("d: ${d.toTokens()}")
The output is as follows:
a: 1 + 2
b: 1 + 2
c: 1
d: 1 + 2.leftExpr
Generally, the expression after an interpolation operator uses parentheses to limit the scope, for example, $(binExpr)
. If there is only one identifier, the expression can be written as $binExpr
, with the parentheses omitted. In this case, both a
and b
insert the binExpr
node into the quote
statement, and the result is 1 + 2
. However, if the expression after the interpolation operator is more complex, a scope error may occur if parentheses are not added. For example, the expression binExpr.leftExpr
is evaluated as the left part of expression 1 + 2
, that is, 1
. Therefore, 1
is correctly assigned to c
. However, the interpolation in d
is interpreted as ($binExpr).leftExpr
, so the result is 1 + 2.leftExpr
. To specify the scope of interpolation, you are advised to use parentheses in interpolation operations.
The following example shows the interpolation of a node list (ArrayList
).
var incrs = ArrayList<Node>()
for (i in 1..=5) {
incrs.append(parseExpr(quote(x += $(i))))
}
var foo = quote(
func foo(n: Int64) {
let x = n
$(incrs)
x
})
println(foo)
The output is as follows:
func foo(n: Int64) {
let x = n
x += 1
x += 2
x += 3
x += 4
x += 5
x
}
In this case, we create a node list incrs
that contains expressions x += 1
, ..., x += 5
. The interpolation of incrs
lists the nodes in sequence and wraps the line for each node. This applies to the scenario where expressions and declarations that need to be executed sequentially are inserted.
The following example shows that in some cases, parentheses need to be added around the interpolation for correctness.
var binExpr1 = BinaryExpr(quote(x + y))
var binExpr2 = BinaryExpr(quote($(binExpr1) * z)) // Error: x + y x z is obtained.
println("binExpr2: ${binExpr2.toTokens()}")
println("binExpr2.leftExpr: ${binExpr2.leftExpr.toTokens()}")
println("binExpr2.rightExpr: ${binExpr2.rightExpr.toTokens()}")
var binExpr3 = BinaryExpr(quote(($(binExpr1)) * z)) / Correct: (x + y) x z is obtained.
println("binExpr3: ${binExpr3.toTokens()}")
The output is as follows:
binExpr2: x + y * z
binExpr2.leftExpr: x
binExpr2.rightExpr: y * z
binExpr3: (x + y) * z
First, we construct the expression x + y
, and then insert the expression into the $(binExpr1) * z
template. The intent is to obtain an expression that first computes x + y
and then multiplies the result by z
. However, the result of interpolation is x + y * z
, in which y * z
is performed before x
is added. This is because interpolation cannot automatically add parentheses to ensure the atomicity of the inserted expression, which differs from the replacement of leftExpr
introduced in the previous section. Therefore, parentheses need to be added around $(binExpr1)
to ensure result correctness.