Program Structure

Cangjie programs are typically written as .cj text files, know collectively as source code. During the final stage of development, the program's source code is compiled into binary files in a specific format, ready for execution.

The top-level scope of a Cangjie program can include definitions of variables, functions, and user-defined types such as struct, class, enum, and interface. These variables and functions are referred to as global variables and global functions, respectively. To compile a Cangjie program into an executable, a main function must be defined as the program's entry point within the top-level scope. The main function may either accept a parameter of type Array<String> or have no parameters at all. Its return type can be either an integer or Unit.

Note:

The definition of the main function does not require the func modifier. Command line parameters when the program is started are obtained via an Array<String> type parameter of main.

For example, in the following program, global variable a, global function b, user-defined types C, D, and E, and the main function that serves as the program entry point are all defined in the top-level scope.

// example.cj
let a = 2023
func b() {}
struct C {}
class D {}
enum E { F | G }

main() {
    println(a)
}

The user-defined types mentioned earlier cannot be defined in a non-top-level scope. However, local variables and local functions can be defined within such scopes. Variables and functions defined within user-defined types are referred to as member variables and member functions, respectively.

Note:

enum and interface permit only the definition of member functions but not member variables.

For example, in the following program, the global function a and user-defined type A are defined in the top-level scope, the local variable b and local function c are defined in the function a, and the member variable b and member function c are defined in the user-defined type A.

// example.cj
func a() {
    let b = 2023
    func c() {
        println(b)
    }
    c()
}

class A {
    let b = 2024
    public func c() {
        println(b)
    }
}

main() {
    a()
    A().c()
}

Running the preceding program outputs:

2023
2024

Variables

In Cangjie, a variable consists of a variable name, data (value), and several attributes. Developers can access the data corresponding to the variable through the variable name. The access operation must comply with the constraints imposed by the associated attributes (such as the data type, mutability, and visibility).

The form of variable definition is as follows:

Modifier Variable name: Variable type = Initial value

Modifier is used to set various attributes of the variable. There can be one or more modifiers. Common modifiers are as follows:

  • Mutability modifiers: let and var, which correspond to the immutable and mutable attributes, respectively. Mutability determines whether the value of a variable can be modified after initialization. Therefore, Cangjie variables are classified into immutable variables and mutable variables.
  • Visibility modifiers: such as private and public, which affect the reference scope of global variables and member variables. For details, see subsequent sections.
  • Static modifier: static, which affects the storage and reference modes of member variables. For details, see subsequent sections.

Mutability modifiers are mandatory in Cangjie variable definition. You can add other modifiers as required.

  • Variable name must be a valid Cangjie identifier.
  • Variable type specifies the type of data held by the variable. When the initial value has an explicit type, a variable type is not required, as the compiler can automatically infer it.
  • Initial value is a Cangjie expression used for variable initialization. If the variable type is specified, the initial value type must have the same type as the variable type. When defining a global variable or a static member variable, an initial value is mandatory. When defining a local variable or an instance member variable, the initial value is optional. However, the variable type and value must be in place before the variable can be referenced. Otherwise, an error will be reported during compilation.

For example, in the following program, an immutable variable a and a mutable variable b are defined, both of the Int64 type. Then, the value of b is modified and the println function is called to print the values of a and b.

main() {
    let a: Int64 = 20
    var b: Int64 = 12
    b = 23
    println("${a}${b}")
}

The following output is produced after the program is compiled and executed:

2023

If the value of an immutable variable is attempted to be modified, an error will be reported during compilation. For example:

main() {
    let pi: Float64 = 3.14159
    pi = 2.71828 // Error, cannot assign to immutable value
}

When the initial value is of an explicit type, the variable type is not required. For example:

main() {
    let a: Int64 = 2023
    let b = a
    println("a - b = ${a - b}")
}

The type of the variable b can be automatically inferred as Int64 from the type of its initial value a. Therefore, the program can be compiled and executed without errors, and the following output is produced:

a - b = 0

A local variable may be left uninitialized. However, an initial value must be assigned before the variable is referenced. For example:

main() {
    let text: String
    text = "The invention of Chinese characters by Cangjie"
    println(text)
}

The following output is produced when the program is compiled and executed:

The invention of Chinese characters by Cangjie

Global variables and static member variables must be initialized when they are defined. Otherwise, an error will be reported during compilation. For example:

// example.cj
let global: Int64 // Error, variable in top-level scope must be initialized
// example.cj
class Player {
    static let score: Int32 // Error, static variable 'score' needs to be initialized when declaring
}

Value-type and Reference-type Variables

During program execution, only the flow of instructions and data transformations occur, while the identifiers used in the program no longer exist in a tangible form. This indicates that the compiler employs a mechanism to bind names to the underlying data entities or memory spaces used during execution.

From the perspective of compiler implementation, a variable is always associated with a value (generally through a memory address or register). A variable whose value is directly used is a value-type variable. A variable whose value is used as an index to retrieve the data represented by the index is a reference-type variable. Variables of value type are usually allocated on the thread stack, with each variable having its own data copy. Variables of the reference type are usually allocated on the process heap, and multiple variables can reference the same data object. In this case, operations performed on a reference-type variable may affect other variables.

From the perspective of the language, a variable of value type exclusively occupies the data or storage space bound to it, whereas a variable of reference type shares the data or storage space bound to it with other variables of reference type.

Based on the preceding principles, there are some behavioral differences to note when value-type variables and reference-type variables are used:

  1. When a value is assigned to a value-type variable, a copy operation is usually performed, and the originally bound data or storage space is overwritten. When a value is assigned to a reference-type variable, only the reference is modified, and the originally bound data or storage space is not overwritten.
  2. Variables defined by let cannot be assigned values after being initialized. For a reference-type variable, this means that the reference itself cannot be modified, but the referenced data can be modified.

In Cangjie, reference-type variables include class and Array, and value-type variables include other basic data types and struct.

For example, the following program demonstrates the behavioral differences between struct and class variables:

struct Copy {
    var data = 2012
}

class Share {
    var data = 2012
}

main() {
    let c1 = Copy()
    var c2 = c1
    c2.data = 2023
    println("${c1.data}, ${c2.data}")

    let s1 = Share()
    let s2 = s1
    s2.data = 2023
    println("${s1.data}, ${s2.data}")
}

Running the program produces the following output:

2012, 2023
2023, 2023

The above shows that, for a value-type Copy variable, a copy of the Copy instance is always obtained during value assignment, for example, in c2 = c1. Subsequent modifications to the value of c2 will not affect the value of c1. For a reference-type Share variable, a reference between the variable and the instance is established during value assignment, for example, in s2 = s1. Subsequent modifications to the value of s2 will affect the value of s1.

If var c2 = c1 in the preceding programs is changed to let c2 = c1, an error will be reported during compilation. For example:

struct Copy {
    var data = 2012
}

main() {
    let c1 = Copy()
    let c2 = c1
    c2.data = 2023 // Error, cannot assign to immutable value
}

Scopes

In the previous section, we discussed how to name Cangjie program elements. In addition to variables, both functions and user-defined types can also be named. These names are then used to reference and access the corresponding program elements within the code.

In actual applications, however, the following special cases need to be considered:

  • When a program is large, short names are more likely to be used repeatedly, causing name conflicts.
  • Considering runtime factors, certain program elements may be invalid in specific code snippets, and referencing them can lead to runtime errors.
  • In some logical constructs, to express the inclusion relationship between elements, a child element should not be accessed directly by its name, but indirectly by its parent element name.

To address these issues, modern programming languages introduce the concept of scope, which limits the binding of names to program elements within a specific range. Scopes can be parallel, unrelated, nested, or enclosed. A scope defines the set of names that can be accessed within a given context. The rules governing scope are as follows:

  1. The binding relationship between a program element and a name defined in a scope is valid in the scope and its inner scope. The name can be used to directly access the corresponding program element.
  2. The binding relationship between a program element and a name defined in an inner scope is invalid in an outer scope.
  3. An inner scope can use the name in an outer scope to redefine the binding relationship. According to rule 1, the name in the inner scope covers the definition of the same name in the outer scope. Therefore, the level of the inner scope is higher than that of the outer scope.

In Cangjie, a pair of braces ({}) is used to enclose a segment of Cangjie code. In this way, a scope is constructed. Furthermore, nested scopes can be constructed by enclosing the scope with braces. These scopes comply with the preceding rules. Specifically, in a Cangjie source file, the scope to which the code that is not enclosed by any braces belongs is called the top-level scope, that is, the outermost scope in the file. According to the preceding rules, its scope level is the lowest.

Note:

When braces are used to enclose code segments and define a scope, functions and user-defined types can be declared, in addition to expressions.

For example, in the following Cangjie source file (test.cj), the name element is defined in the top-level scope, which is bound to the string "Cangjie". The name element is also defined in the code blocks led by main and if, corresponding to the integer 9 and integer 2023, respectively. According to the preceding scope rules, the value of element is "Cangjie" in line 4, 2023 in line 8, and 9 in line 10.

// test.cj
let element = "Cangjie"
main() {
    println(element)
    let element = 9
    if (element > 0) {
        let element = 2023
        println(element)
    }
    println(element)
}

Running the preceding program produces the output:

Cangjie
2023
9