Cangjie-C Interoperability

To ensure compatibility with existing ecosystems, Cangjie supports calling C language functions and allows C language functions to call Cangjie functions.

Cangjie Calling C Function

To call a C function in Cangjie, you need to declare the function using the @C and foreign modifiers in Cangjie, but@C can be omitted if foreign is present.

For example, if you want to call the rand and printf functions of C, their function signatures are as follows:

// stdlib.h
int rand();

// stdio.h
int printf (const char *fmt, ...);

In this case, a manner of calling the two functions in Cangjie is as follows:

// declare the function by `foreign` keyword, and omit `@C`
foreign func rand(): Int32
foreign func printf(fmt: CString, ...): Int32

main() {
    // call this function by `unsafe` block
    let r = unsafe { rand() }
    println("random number ${r}")
    unsafe {
        var fmt = LibC.mallocCString("Hello, No.%d\n")
        printf(fmt, 1)
        LibC.free(fmt)
    }
}

Note that:

  1. foreign: used to modify a function declaration, indicating that the function is an external function. A function modified by foreign can have only a function declaration but not a function implementation.
  2. foreign: functions declared with foreign must have parameters and return types that comply with the mapping between C and Cangjie data types. For details, see [Type Mapping] (./cangjie-c.md#type-mapping).
  3. Functions on the C side may cause unsafe operations. Therefore, when calling the function modified by foreign, it must be wrapped by the unsafe block. Otherwise, a compilation error occurs.
  4. The foreign keyword modified by @C can only be used to modify function declarations and cannot be used to modify other declarations. Otherwise, a compilation error occurs.
  5. @C supports only the foreign function, non-generic functions in the top-level scope, and struct types.
  6. The foreign function does not support named parameters or default parameter values. The foreign function allows variable-length parameters, which are expressed by ... and can be used only at the end of the parameter list. Variable-length parameters must meet the CType constraint, but they do not need to be of the same type.
  7. Although Cangjie (CJNative backend) provides the stack capacity expansion capability, it cannot detect the actual stack size used by the C-side function. Therefore, after the FFI calls the C-side function, stack overflow may still occur. You need to modify the cjStackSize configuration based on actual situation.

Sample codes for invalid foreign declarations are as follows:

foreign func rand(): Int32 { // compiler error
    return 0
}
@C
foreign var a: Int32 = 0 // compiler error
@C
foreign class A{} // compiler error
@C
foreign interface B{} // compiler error

CFunc

In Cangjie, CFunc refers to the function that can be called by C language code. There are three forms:

  1. foreign function modified by @C.
  2. Cangjie function modified by @C.
  3. The lambda expression of the CFunc type is different from the common lambda expression. The CFunc lambda expression cannot capture variables.
// Case 1
foreign func free(ptr: CPointer<Int8>): Unit

// Case 2
@C
func callableInC(ptr: CPointer<Int8>) {
    print("This function is defined in Cangjie.")
}

// Case 3
let f1: CFunc<(CPointer<Int8>) -> Unit> = { ptr =>
    print("This function is defined with CFunc lambda.")
}

The type of functions declared or defined in the preceding three forms is CFunc<(CPointer<Int8>) -> Unit>. CFunc corresponds to the function pointer type of the C language. This type is a generic type. Its generic parameter indicates the type of the CFunc input parameter and return value. The usage is as follows:

foreign func atexit(cb: CFunc<() -> Unit>): Int32

Similar to the foreign function, the parameters and return types of other CFunc functions must meet the CType constraint and do not support named parameters and default parameter values.

When CFunc is called in Cangjie code, it must be in the unsafe context.

The Cangjie language can convert a variable of the CPointer<T> type to a specific CFunc. The generic parameter T of CPointer can be any type that meets the CType constraint. The method is as follows:

main() {
    var ptr = CPointer<Int8>()
    var f = CFunc<() -> Unit>(ptr)
    unsafe { f() } // core dumped when running, because the pointer is nullptr.
}

Note:

It is dangerous to forcibly convert a pointer to CFunc and call a function. You need to ensure that the pointer points to an available function address. Otherwise, a runtime error occurs.

inout Parameter

When CFunc is called in Cangjie, the parameter can be modified by the inout keyword to form a reference value transfer expression. In this case, the parameter is transferred by reference. The type of the referenced value transfer expression is CPointer<T>, where T is the type of the expression modified by inout.

The value transfer expression by reference has the following restrictions:

  • It can only be used to call CFunc.
  • The type of the modifier object must meet the CType constraint, but cannot be CString.
  • The modifier object cannot be defined by let or temporary variables such as literals, input parameters, and values of other expressions.
  • The pointer transferred to the C side by using the value transfer expression on the Cangjie side is valid only during function calling. That is, in this scenario, the C side should not save the pointer for future use.

Variables modified by inout can be variables defined in top-level scope, local variables, and member variables in struct types, but cannot be directly or indirectly derived from instance member variables of class types.

The following is an example:

foreign func foo1(ptr: CPointer<Int32>): Unit

@C
func foo2(ptr: CPointer<Int32>): Unit {
    let n = unsafe { ptr.read() }
    println("*ptr = ${n}")
}

let foo3: CFunc<(CPointer<Int32>) -> Unit> = { ptr =>
    let n = unsafe { ptr.read() }
    println("*ptr = ${n}")
}

struct Data {
    var n: Int32 = 0
}

class A {
    var data = Data()
}

main() {
    var n: Int32 = 0
    unsafe {
        foo1(inout n)  // OK
        foo2(inout n)  // OK
        foo3(inout n)  // OK
    }
    var data = Data()
    var a = A()
    unsafe {
        foo1(inout data.n)   // OK
        foo1(inout a.data.n) // Error, n is derived indirectly from instance member variables of class A
    }
}

Note:

When the macro extension feature is used, the inout parameter feature cannot be used in the macro definition.

unsafe

Many unsafe factors of C are also introduced during the introduction of interoperability with C language. Therefore, the unsafe keyword is used in Cangjie to identify unsafe behaviors of cross-C calling.

The unsafe keyword is described as follows:

  • unsafe can be used to modify functions, expressions, or a scope.
  • Functions modified by @C must be called in the unsafe context.
  • When CFunc is called, it must be used in the unsafe context.
  • When a foreign function is called in Cangjie, the call must be in the unsafe context.
  • When the called function is modified by unsafe, the call must be in the unsafe context.

The method is as follows:

foreign func rand(): Int32

@C
func foo(): Unit {
    println("foo")
}

var foo1: CFunc<() -> Unit> = { =>
    println("foo1")
}

main(): Int64 {
    unsafe {
        rand()           // Call foreign func.
        foo()            // Call @C func.
        foo1()           // Call CFunc var.
    }
    0
}

Note that the common lambda expression cannot transfer the unsafe attribute. When an unsafe lambda expression escapes, it can be directly called without any compilation error in the unsafe context. To call an unsafe function in a lambda expression, you are advised to call the function in an unsafe block. For details, see the following case:

unsafe func A(){}
unsafe func B(){
    var f = { =>
        unsafe { A() } // Avoid calling A() directly without unsafe in a normal lambda.
    }
    return f
}
main() {
    var f = unsafe{ B() }
    f()
    println("Hello World")
}

Calling Conventions

Function calling conventions describe how the caller and callee call functions (for example, how parameters are transferred and who clears the stack). The caller and callee must use the same calling conventions to run properly. The Cangjie programming language uses @CallingConv to indicate various calling conventions. The supported calling conventions are as follows:

  • CDECL: The default calling conventions used by the C compiler of Clang on different platforms.
  • STDCALL: The calling conventions used by the Win32 API.

If a C function is called using the C language interoperability mechanism, the default CDECL calling conventions is used when no calling convention is specified. The following is an example of calling the rand function in the C standard library:

@CallingConv[CDECL]   // Can be omitted in default.
foreign func rand(): Int32

main() {
    println(unsafe { rand() })
}

@CallingConv can only be used to modify the foreign block, a single foreign function, and a CFunc function in the top-level scope. When @CallingConv modifies the foreign block, the same @CallingConv modification is added to each function in the foreign block.

Type Mapping

Base Types

The Cangjie and C languages support the mapping of basic data types. The general principles are as follows:

  1. The Cangjie type does not contain references pointing to the managed memory.
  2. The Cangjie type and the C type have the same memory layout.

For example, some basic type mapping relationships are as follows:

Cangjie TypeC TypeSize (byte)
Unitvoid0
Boolbool1
UInt8char1
Int8int8_t1
UInt8uint8_t1
Int16int16_t2
UInt16uint16_t2
Int32int32_t4
UInt32uint32_t4
Int64int64_t8
UInt64uint64_t8
IntNativessize_tplatform dependent
UIntNativesize_tplatform dependent
Float32float4
Float64double8

Note:

Due to the uncertainty of the int and long types on different platforms, programmers need to specify the corresponding Cangjie programming language type. In C interoperability scenarios, similar to the C language, the Unit type can only be used as the return type in CFunc and the generic parameter of CPointer.

Cangjie also supports the mapping with the structures and pointer types of the C language.

Structure

For the structure type, Cangjie uses struct modified by @C. For example, the C language has the following structure:

typedef struct {
    long long x;
    long long y;
    long long z;
} Point3D;

The corresponding Cangjie type can be defined as follows:

@C
struct Point3D {
    var x: Int64 = 0
    var y: Int64 = 0
    var z: Int64 = 0
}

If the C language contains such a function:

Point3D addPoint(Point3D p1, Point3D p2);

Accordingly, the function can be declared in Cangjie as follows:

foreign func addPoint(p1: Point3D, p2: Point3D): Point3D

The struct modified by @C must meet the following requirements:

  • The type of a member variable must meet the CType constraint.
  • interface types cannot be implemented or extended.
  • To be used as an associated value type of enum is not allowed.
  • Closure capture is not allowed.
  • Generic parameters are not allowed.

The struct modified by @C automatically meets the CType constraint.

Pointer

For the pointer type, Cangjie provides the CPointer<T> type to correspond to the pointer type on the C side. The generic parameter T must meet the CType constraint. For example, the signature of the malloc function in C is as follows:

void* malloc(size_t size);

In Cangjie, it can be declared as follows:

foreign func malloc(size: UIntNative): CPointer<Unit>

The CPointer can be used for read and write, offset calculation, null check, and pointer conversion. For details about the API, see "Cangjie Programming Language Library API". Read, write, and offset calculation are unsafe behaviors. When invalid pointers call these functions, undefined behaviors may occur. These unsafe functions need to be called in unsafe blocks.

The following is an example of using CPointer:

foreign func malloc(size: UIntNative): CPointer<Unit>
foreign func free(ptr: CPointer<Unit>): Unit

@C
struct Point3D {
    var x: Int64
    var y: Int64
    var z: Int64

    init(x: Int64, y: Int64, z: Int64) {
        this.x = x
        this.y = y
        this.z = z
    }
}

main() {
    let p1 = CPointer<Point3D>() // create a CPointer with null value
    if (p1.isNull()) {  // check if the pointer is null
        print("p1 is a null pointer")
    }

    let sizeofPoint3D: UIntNative = 24
    var p2 = unsafe { malloc(sizeofPoint3D) }    // malloc a Point3D in heap
    var p3 = unsafe { CPointer<Point3D>(p2) }    // pointer type cast

    unsafe { p3.write(Point3D(1, 2, 3)) } // write data through pointer

    let p4: Point3D = unsafe { p3.read() } // read data through pointer

    let p5: CPointer<Point3D> = unsafe { p3 + 1 } // offset of pointer

    unsafe { free(p2) }
}

Cangjie supports forcible type conversion between CPointer. The generic parameter T of CPointer before and after the conversion must meet the constraints of CType. The method is as follows:

main() {
    var pInt8 = CPointer<Int8>()
    var pUInt8 = CPointer<UInt8>(pInt8) // CPointer<Int8> convert to CPointer<UInt8>
    0
}

Cangjie can convert a variable of the CFunc type to a specific CPointer. The generic parameter T of CPointer can be any type that meets the CType constraint. The method is as follows:

foreign func rand(): Int32
main() {
    var ptr = CPointer<Int8>(rand)
    0
}

Note:

It is safe to forcibly convert a CFunc to a pointer. However, no read or write operation should be performed on the converted pointer, which may cause runtime errors.

Array

Cangjie uses the VArray type to map to the array type of C. The VArray type can be used as a function parameter or @C struct member. When the element type T in VArray<T, $N> meets the CType constraint, the VArray<T, $N> type also meets the CType constraint.

As a function parameter type:

When VArray is used as a parameter of CFunc, the function signature of CFunc can only be of the CPointer<T> or the VArray<T, $N> type. If the parameter type in the function signature is VArray<T, $N>, the parameter is transferred in the CPointer<T> format.

The following is an example of using VArray as a parameter:

foreign func cfoo1(a: CPointer<Int32>): Unit
foreign func cfoo2(a: VArray<Int32, $3>): Unit

The corresponding C-side function definition may be as follows:

void cfoo1(int *a) { ... }
void cfoo2(int a[3]) { ... }

When calling CFunc, you need to use inout to modify the variable of the VArray type.

var a: VArray<Int32, $3> = [1, 2, 3]
unsafe {
    cfoo1(inout a)
    cfoo2(inout a)
}

VArray cannot be used as the return value type of CFunc.

As a member of @C struct:

When VArray is a member of @C struct, its memory layout is the same as the structure layout on the C side. Ensure that the declaration length and type on the Cangjie side are the same as those on the C side.

struct S {
    int a[2];
    int b[0];
}

In Cangjie, the following structure can be declared to correspond to the C code:

@C
struct S {
    var a = VArray<Int32, $2>(item: 0)
    var b = VArray<Int32, $0>(item: 0)
}

Note:

In the C language, the last field of a structure can be an array whose length is not specified. The array is called a flexible array. Cangjie does not support the mapping of structures that contain flexible arrays.

Character String

Particularly, for a string type in the C language, a CString type is designed in Cangjie. To simplify operations on C language strings, CString provides the following member functions:

  • init(p: CPointer<UInt8>): constructs a CString through CPointer.
  • func getChars(): obtaining the address of a character string. The type is CPointer<UInt8>.
  • func size(): Int64: calculates the length of the character string.
  • func isEmpty(): Bool: checks that the length of the string is 0. If the pointer of the string is null, true is returned.
  • func isNotEmpty(): Bool: checks that the length of the string is not 0. If the pointer of the string is null, false is returned.
  • func isNull(): Bool: checks whether the pointer of the character string is null.
  • func startsWith(str: CString): Bool: checks whether the character string starts with str.
  • func endsWith(str: CString): Bool: checks whether the character string ends with str.
  • func equals(rhs: CString): Bool: checks whether the character string is equal to rhs.
  • func equalsLower(rhs: CString): Bool: checks whether the character string is equal to rhs. The value is case-insensitive.
  • func subCString(start: UInt64): CString: truncates a substring from start and stores the returned substring in the newly allocated space.
  • func subCString(start: UInt64, len: UInt64): CString: truncates a substring whose length is len from start and stores the returned substring in the newly allocated space.
  • func compare(str: CString): Int32: returns a result which is the same as strcmp(this, str) in the C language compared with str.
  • func toString(): String: constructs a new String object using this string.
  • func asResource(): CStringResource: obtains the resource type of CString.

In addition, the mallocCString function in LibC can be called to convert String to CString. After the conversion is complete, CString needs to be released.

The following is an example of using CString:

foreign func strlen(s: CString): UIntNative

main() {
    var s1 = unsafe { LibC.mallocCString("hello") }
    var s2 = unsafe { LibC.mallocCString("world") }

    let t1: Int64 = s1.size()
    let t2: Bool = s2.isEmpty()
    let t3: Bool = s1.equals(s2)
    let t4: Bool = s1.startsWith(s2)
    let t5: Int32 = s1.compare(s2)

    let length = unsafe { strlen(s1) }

    unsafe {
        LibC.free(s1)
        LibC.free(s2)
    }
}

sizeOf/alignOf

Cangjie also provides the sizeOf and alignOf functions to obtain the memory usage and memory alignment values (in bytes) of the preceding C interoperability types. The function declaration is as follows:

public func sizeOf<T>(): UIntNative where T <: CType
public func alignOf<T>(): UIntNative where T <: CType

Example:

@C
struct Data {
    var a: Int64 = 0
    var b: Float32 = 0.0
}

main() {
    println(sizeOf<Data>())
    println(alignOf<Data>())
}

If you run the command on a 64-bit computer, the following information is displayed:

16
8

CType

In addition to the types that are mapped to C-side types provided in "Type Mapping", Cangjie provides a CType interface. The interface does not contain any method and can be used as the parent type of all types supported by C interoperability for easy use in generic constraints.

Note that:

  1. The CType interface itself does not meet the CType constraint.
  2. The CType interface cannot be inherited or extended.
  3. The CType interface does not break the usage restrictions of subtypes.

The following is an example of using CType:

func foo<T>(x: T): Unit where T <: CType {
    match (x) {
        case i32: Int32 => println(i32)
        case ptr: CPointer<Int8> => println(ptr.isNull())
        case f: CFunc<() -> Unit> => unsafe { f() }
        case _ => println("match failed")
    }
}

main() {
    var i32: Int32 = 1
    var ptr = CPointer<Int8>()
    var f: CFunc<() -> Unit> = { => println("Hello") }
    var f64 = 1.0
    foo(i32)
    foo(ptr)
    foo(f)
    foo(f64)
}

The result is as follows:

1
true
Hello
match failed

C Calling Cangjie Functions

Cangjie provides the CFunc type to correspond to the function pointer type on the C side. The function pointer on the C side can be transferred to Cangjie, and Cangjie can also construct and transfer a variable corresponding to the function pointer on the C side.

Assume that a C library API is as follows:

typedef void (*callback)(int);
void set_callback(callback cb);

Correspondingly, the function in Cangjie can be declared as follows:

foreign func set_callback(cb: CFunc<(Int32) -> Unit>): Unit

Variables of the CFunc type can be transferred from the C side or constructed on the Cangjie side. There are two methods to construct the CFunc type on the Cangjie side. One is to use the function modified by @C, and the other is to use the closure marked as CFunc.

The function modified by @C indicates that its function signature meets the calling rules of C and the definition is still written in Cangjie. The function modified by foreign is defined on the C side.

Note:

For functions modified by foreign and @C, which are named CFunc, you are advised not to use CJ_ (case-insensitive) as the prefix. Otherwise, the names may conflict with internal compiler symbols such as the standard library and runtime, resulting in undefined behavior.

Example:

@C
func myCallback(s: Int32): Unit {
    println("handle ${s} in callback")
}

main() {
    // the argument is a function qualified by `@C`
    unsafe { set_callback(myCallback) }

    // the argument is a lambda with `CFunc` type
    let f: CFunc<(Int32) -> Unit> = { i => println("handle ${i} in callback") }
    unsafe { set_callback(f) }
}

Assume that the library compiled by the C function is "libmyfunc.so". You need to run the cjc -L. -lmyfunc test.cj -o test.out compilation command to enable the Cangjie compiler to link to the library. Finally, the desired executable program can be generated.

In addition, when compiling the C code, enable the -fstack-protector-all/-fstack-protector-strong stack protection option. By default, the Cangjie code has the overflow check and stack protection functions. After the C code is introduced, the security of overflows in unsafe blocks needs to be ensured.

Compiler Options

To use C interoperability, you need to manually link the C library. The Cangjie compiler provides corresponding options.

  • --library-path <value>, -L <value>, -L<value>: specifies the directory of the library file to be linked.

    --library-path <value>: adds the specified path to the library file search paths of the linker. The path specified by the environment variable LIBRARY_PATH will also be added to the library file search paths of the linker. The path specified by --library-path has a higher priority than the path specified by LIBRARY_PATH.

  • --library <value>, -l <value>, -l<value>: specifies the library file to be linked.

    The specified library file is directly transferred to the linker. The library file name must be in the lib[arg].[extension] format.

For details about all compilation options supported by the CJC compiler, see [CJC Compilation Options] (../Appendix/compile_options_OHOS.md).

Example

Assume that there is a C library libpaint.so whose header file is as follows:

include <stdint.h>

typedef struct {
    int64_t x;
    int64_t y;
} Point;

typedef struct {
    int64_t x;
    int64_t y;
    int64_t r;
} Circle;

int32_t DrawPoint(const Point* point);
int32_t DrawCircle(const Circle* circle);

The sample code for using the C library in the Cangjie code is as follows:

// main.cj
foreign {
    func DrawPoint(point: CPointer<Point>): Int32
    func DrawCircle(circle: CPointer<Circle>): Int32

    func malloc(size: UIntNative): CPointer<Int8>
    func free(ptr: CPointer<Int8>): Unit
}

@C
struct Point {
    var x: Int64 = 0
    var y: Int64 = 0
}

@C
struct Circle {
    var x: Int64 = 0
    var y: Int64 = 0
    var r: Int64 = 0
}

main() {
    let SIZE_OF_POINT: UIntNative = 16
    let SIZE_OF_CIRCLE: UIntNative = 24
    let ptr1 = unsafe { malloc(SIZE_OF_POINT) }
    let ptr2 = unsafe { malloc(SIZE_OF_CIRCLE) }

    let pPoint = CPointer<Point>(ptr1)
    let pCircle = CPointer<Circle>(ptr2)

    var point = Point()
    point.x = 10
    point.y = 20
    unsafe { pPoint.write(point) }

    var circle = Circle()
    circle.r = 1
    unsafe { pCircle.write(circle) }

    unsafe {
        DrawPoint(pPoint)
        DrawCircle(pCircle)

        free(ptr1)
        free(ptr2)
    }
}

Run the following command to compile the Cangjie code (using the CJNative backend as an example):

cjc -L . -l paint ./main.cj

In the compilation command, -L . indicates that the library is queried from the current directory (assume that libpaint.so exists in the current directory). -l paint indicates the name of the linked library. After the compilation is successful, the binary file main is generated by default. The command for running the binary file is as follows:

LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./main