Generics

If a declaration uses angle brackets to declare type parameters, it is called a generic declaration. When generic declarations are used, type parameters can be replaced with other types. You can define a generic function by declaring type parameters in a function signature. You can define a generic type by declaring type parameters in the definitions of class, interface, struct, enum, or type alias.

Type Parameters and Type Variables

In Cangjie, type parameters are represented by identifiers enclosed in <>. You can provide multiple type parameters by separating them with , inside the <>, such as<T1, T2, T3>, where T1, T2, T3 are type parameters.

Once type parameters are declared, they can be used as types.

When identifiers are used to reference declared type parameters, these identifiers are called type variables.

The syntax of type parameters is as follows:

typeParameters
    : '<' identifier (',' identifier)* '>'
    ;

Generic Constraints

In Cangjie, <: can be used to indicate that one type is a subtype of another. By declaring this relationship, you can impose constraints on generic type parameters, ensuring they can only be replaced with types that meet specific constraints.

Generic constraints are declared by the operator <: following where and consists of a lower bound and an upper bound. The part on the left of <: is referred to as the lower bound of the constraint, which can only be a type variable. The part on the right of <: is referred to as the upper bound of the constraint, which can be a type. When the upper bound is a type, the constraint is a subtype constraint. The upper bound can be of any type.

A type variable may be subject to multiple upper bound constraints at the same time. Multiple upper bounds of the same type parameter must be connected using &, which simplifies cases where a type parameter has several upper bounds. In essence, the type variable has multiple generic constraints. Constraints for different type variables should be separated by ,.

The syntax for declaring a constraint is as follows:

upperBounds
    :  type ('&' type)*
    ;

genericConstraints
    : 'where' identifier  '<:' upperBounds (',' identifier  '<:' upperBounds)*
    ;

The following example shows how to declare a generic constraint.

interface Enumerable<U> where U <: Bounded {...}

interface Comparable<T> {...}

func collectionCompare<T>(a: T, b: T) where T <: Comparable<T> & Seqence {...}

func sort<T, V>(input: T)
    where T <:  Enumerable<V>, V <: Object {...}

If the constraints of type variables X and Y are the same, it means that all types that meet the constraints of X meet those of Y, and all types that meet the constraints of Y also meet those of X.

If the constraints of the type variable X are stricter than those of Y, it means that all types that meet the constraints of X meet those of Y, but not the other way around.

If the constraints of X are stricter than those of Y, the constraints of Y are looser than those of X.

If two generic types have the same constraints, it means that they have the same number of type variables and all corresponding type variable constraints are identical.

If the constraint of a generic type A is stricter than that of another generic type B, it means that the number of type variables of A is the same as that of B, but the constraints of all type variables of A are stricter than those in B.

If the constraint of a generic type A is stricter than that of another generic type B, the constraint of B is looser than that of A.

For example, in the following two generics C and D, if I1 <: I2, then the constraint of C is stricter than that of D:

class C<X, Y> where X <: I1, Y <: I1 and class D<X, Y> where X <: I2, Y <: I2

Type Variance

Before moving on to generic functions and generic types, let's learn about the following type variance to comprehend the subtype relationships of generic types in Cangjie.

Definition

If A and B are types, and T is a type constructor with a type parameter X, then:

When T(A) <: T(B) if and only if A <: B, T is covariant at X.
When T(A) <: T(B) if and only if B <: A, T is contravariant at X.
When T(A) <: T(B) if and only if A = B, T is invariant.

Generic Invariance

In Cangjie, all generics are invariant. This means that if A is a subtype of B, there is no subtype relationship between ClassName<A> and ClassName<B>. The purpose is to ensure runtime safety.

Variance of Function Types

The parameter types of functions are contravariant, while the return types of functions are covariant. Suppose there is a function f1 with the type S1 -> T1, and another function f2 with the type S2 -> T2. If S2 <: S1 and T1 <: T2, then the type of f1 is a subtype of the type of f2.

Covariance of Tuple Types

There is a subtype relationship between tuples. If each element of one tuple is a subtype of the corresponding element in another tuple, then the first tuple is a subtype of the second one. Suppose there are tuples Tuple1 and Tuple2, with types (A1, A2, ..., An) and (B1, B2, ..., Bn). If for all i, it holds that Ai <: Bi, then Tuple1 <: Tuple2.

Restrictions on Variance

Currently, variance relationships are prohibited in two scenarios:

Interfaces implemented by types other than class. The subtype relationship between such types and their interfaces cannot serve as a basis for covariance and contravariance.
Interfaces implemented by the implementation types through extensions. The subtype relationship between such types and their interfaces cannot serve as a basis for covariance and contravariance.

These restrictions affect not only variance relationships but also the determination of override for subtype.

A type that is not applicable to the type variance relationship cannot be used as the basis for subtypes when overriding occurs.

interface I {
    func f(): Any
}
 
class Foo <: I {
    func f(): Int64 { // error
        ...
    }
}

Constraints on Upper Bounds

For a constraint L <: T<T1..Tn>, its upper bound is T<T1..Tn>. The declaration of T's type parameters may need to meet some constraints. After the argument Ti is replaced, these constraints must be implicitly introduced into the current context of the declaration. For example:

interface Eq<T> {
	func eq(other: T): Bool
}

interface Ord<T> where T <: Eq<T> {
	func lt(other: T): Bool
}

func foo<T>(a: T) where T <: Ord<T> {
	a.eq(a)
	a.lt(a)
}

For the foo function, even though only T is constrained by Ord, if Ord's T type is constrained by Eq, then the eq function from Eq can be used within the foo function. In this way, the constraint of the foo function is actually T <: Eq & Ord. Therefore, when a generic parameter meeting a constraint is declared, the constraints on the upper bounds are also introduced.

This rule of implicitly introducing constraints on upper bounds is also applicable to other generic declarations. For example:

interface A {}
class B<T> where T <: A {}
class C 

Note: Although the constraints of the upper bound in the current declaration are implicitly introduced, the current declaration can still explicitly write these constraints.

## Definitions of Generic Functions and Generic Types

### Generic Functions

If a function declares one or more type parameters, it is called a generic function. In terms of syntax, the type parameters follow the function name and are enclosed in `<>`. If there are multiple type parameters, they are separated by `,`.

```cangjie
func f<T>() {...}
func f1<T1, T2, T3, ...>() {...}

The syntax of a generic function is as follows:

functionDefinition
    : functionModifierList? 'func' identifier
    typeParameters functionParameters
    (':' type)? genericConstraints?
    block?
    ;

Note that:

When < and > are used, they are parsed as generics first. If they are parsed successfully, they are generic expressions. Otherwise, they are comparison operators. For example:

(c <d , e> (f))

This expression is parsed as a function call expression.

Generic Types

If a class, interface, struct, enum, or type alias definition declares one or more type parameters, it is called a generic type. In syntax, type parameters follow the type name (such as class name or interface name) and are enclosed by <>. If there are multiple type parameters, they are separated by ,.

class C<T1, T2> {...}	// generic class
interface I<T1> {...}	// generic interface
type BinaryOperator<T> = (T, T) -> T	// generic typealias

For the syntax of generic declarations, see the corresponding sections.

Generic Type Check

Checking Generic Declarations

Sanity Check for Generic Constraints

For all type parameters in a declaration, the upper bound of each type parameter can be classified into two cases:

The upper bound is also a type variable, which may be itself or another type variable.
When the upper bound is of a specific type, it can be further divided into two situations:
- The first situation involves upper bounds that are class or interface types, which are referred to as class-related types.
- The second situation involves upper bounds that are not class or interface types, which are referred to as class-independent types.
In Cangjie, the upper bounds of one or more specific types for a type variable must meet the following rules:
1. All upper bounds can only be the same case; they must either all be class-related types or class-independent types. For example, T <: Object & Int32 is invalid.
2. When the upper bound is a class-related type, if there are multiple upper bounds of classes, these classes must be on the same inheritance chain. This restriction does not apply to interfaces. Conflicting member definitions are not allowed in multiple generic upper bounds of a type variable. Specifically, conflicts refer to functions with the same name or operators that do not constitute overloads and there is no subtype relationship between return types.
3. When the upper bound is a class-independent type, only one class-independent specific type can be included. Two different types are not allowed. For example, T <: Int32 & Bool is invalid.
4. Recursive constraints cannot exist when the upper bound of a type variable is a class-independent type. A recursive generic constraint is one where the upper bound's type parameter directly or indirectly depends on the lower bound type variable. For example, the recursive generic constraint T <: Option<T> is invalid because Option is declared using the enum keyword. Similarly, T <: U, U <: (Int32) -> T is invalid because the function is of the value type and the T type indirectly depends on itself through U.

Type Compatibility Check

The purpose of type checking for generic declarations is to ensure that the generic type is compatible with its surrounding type context and the access to member functions and variables is valid. For example:

open class C {
    func coo() {...}
}

class D <: C {...}

interface Tr {
    func bar(): Int64
}

func foo<T>(a: T) {
    var b: C = a // error, T is not a subtype of C
    a.coo() // error, T has no member function coo
    a.bar() // error, T did not implement Tr
}

Three errors are reported in the foo function body of the preceding sample code. The causes are as follows:

The expected type of the variable b declared in the foo function is C. Therefore, you need to check whether T is a subtype of D, that is, T <: C. However, this constraint does not exist in the context of T. As a result, an error is reported during compilation of the variable declaration.
The generic type T is irrelevant to C in the current context. Therefore, the member function coo of C cannot be accessed.
Similarly, the T type does not have the constraint of Tr. Therefore, the member function bar of Tr cannot be accessed.

To pass the type check, add a generic constraint before the declaration body.

open class C {
	func coo() {...}
}

interface Tr {
    func bar(): Int64
}

func foo<T>(a: T) where T <: C & Tr {
	var b: C = a // OK, T is a sub type of C now
	a.coo() // OK, T is a sub type of C, so it has coo member function
	a.bar() // OK, T is constrained by Tr
}

In particular, if a generic upper bound of a type variable T includes a function type or a type that overloads the function call operator (), a value of type T may be considered as a function call. When the upper bound is a function type, the return type of the call is a return type of an upper bound of T. When the upper bound is a type that overloads the function call operator (), the return type of the call is a return type of a matched function call operator in the upper bound type.

Checking the Use of Generic Declarations

The primary purpose of checking the use of generic declarations is to substitute the arguments into the generic declaration's type parameters, and then verify whether the constraints are valid.

If the C type is used to call the foo function defined in the previous section:

main(): Int64 {
	foo<C>(C()) // error C does not implement Tr
	return 0
}

In this case, an error is reported indicating that type C does not implement Tr. This is because the constraints in the foo function specify T <: C & Tr. When T is substituted with C, C <: C is valid, but C <: Tr is invalid.

If the declaration of implementing Tr is added to the C type, the T <: Tr constraint can be met.

extend C <: Tr {
	func bar(): Int64 {...}
}

In particular, when an interface is used as a generic constraint, the type instantiated for the generic variable must fully implement the interface static function in all upper bound constraints.

This means that if a static function exists in an interface that is used as a generic constraint, the interface or abstract class that does not implement the corresponding static function cannot be used as the type instantiated for the generic variable.

interface I {
    static func f(): Unit
}

func g<T>(): Unit where T <: I {}

main() {
    g<I>() // error
    return 0
}

Depth of Generic Instantiation

To prevent infinite loops or memory exhaustion during generic instantiation, the number of instantiation layers is limited during compilation. For example:

class A<T>{
    func test(a: A<(A<T>, A<T>)>): Bool {true}
}

main(): Int64 {
    var a : A<Int32> = A<Int32>()
    return 0
}

This code throws an infinite instantiation error.

Generic Instantiation

The process of forming a corresponding non-generic language structure from a generic declaration after all type parameters are determined is called generic instantiation.

Instantiation of Generic Functions

func show<T>(a: T) where T <: ToString {
    a.toString()
}

After T is substituted with Int32, the following instance is generated (assuming that show$Int32 is the internal representation after compiler instantiation; similar representation is used in the following):

func show$Int32(a: Int32) {
    a.toString()
}

Restrictions on Instantiating Generic Functions

In Cangjie, generic functions cannot be declared in the following cases:

Non-static abstract functions in interfaces and abstract classes
Instance member functions in classes and abstract classes marked with the open keyword
Operator overloading functions

For example, the declaration and definition of the following functions are invalid.

abstract class AbsClass {
	public func foo<T>(a: T): Unit // error: abstract generic function in abstract class
	public open func bar<T>(a: T) { // error: open generic function in abstract class
		...
	}
}

interface IF {
	func foo<T>(a: T): Unit // error: abstract generic function in interface
}

open class Foo {
	public open func foo<T>(a: T): Unit { // error: open generic function in class
    	... 
    } 
}

The following generic functions are valid.

class Foo {
	static func foo<T>(a: T) {...} // generic static function in class
	func bar<T>(a: T) {...} // generic non-open function in class
}

abstract class Bar {
	func bar<T>(a: T) {...} // generic non-open function in abstract class
}

struct R {
	func foo<T>(a: T) {...} // generic function in struct
}

enum E {
    A | B | C
    func e<T>(a: T) {...} // generic function in enum
}

Instantiation of Classes and Interfaces

class Foo<T> <: IBar<T>{
    var a: T
    init(a: T) {
        this.a = a
    }
    static func foo(a: T) {...}
    public func bar(a: T, b: Int32): Unit {...}
}

interface IBar<T> {
    func bar(a: T, b: Int32): Unit
}

When T=Int32 is specified, the declaration of the following instance is generated:

class Foo$Int32 <: IBar$Int32 {
    var a: Int32
    static func foo(a: Int32) {...}
    func bar(a: Int32) {...}
}

interface IBar$Int32 {
    func bar(a: Int32, b: Int32)
}

Instantiation of Structs

The instantiation of a struct is very similar to that of a class.

struct Foo<T> {
    func foo(a: T) {...}
}

When T=Int32 is specified, the declaration of the following instance is generated:

struct Foo$Int32 {
    func foo(a: Int32) {...}
}

Instantiation of Enums

enum Either<T,R> {
      Left(T)
    | Right(R)
}

When Either is given the parameters Int32 and Bool, the type is instantiated as follows:

enum Either$Int32$Bool {
	  Left(Int32)
 	| Right(Bool)
}

When a generic declaration is used, for example, a generic function is called or a value of a generic type is constructed, what actually take effect are instances with all type parameters determined during compilation. That is, instantiation occurs only after all generic parameters are of a specific type.

Generic Function Overloading

Cangjie supports overloading between generic functions as well as between generic functions and non-generic functions. For details about overloading, see [Function Overloading].

When a function is called, overloading is processed as follows:

Step 1: Construct a candidate set for function calls. Functions included in this set must pass type checks and are callable. For details, see [Candidate Set of Overloaded Functions]. There are additional rules for generic functions, which will be detailed below.

Step 2: Select the best matching function based on the scope priority rules (see [Scope Priority]) and the best matching rules (see [Best Matching Rules]). If the unique best matching function is not determined, an error is reported.

Step 3: If there are multiple argument types, determine the parameter types based on the best matching function. If the unique argument type is not determined, an error is reported.

When constructing the candidate set for function calls, pay attention to the following points for generic functions:

1. When a function is called, for example, generic function f, the candidate set may include either a partially instantiated generic function or a fully instantiated function. Which form enters the candidate set depends on the form of the call expression.

The call expression is in the format of C<TA>.f(A), that is, f is a static member function of a generic type. The type is instantiated first, and then the type check of function call for the static member function of the instantiated type is performed. The function enters the candidate set after it passes the check. If C has a type parameter X, the function that enters the candidate set is f', where X is replaced with TA in f.

$$ \sigma = [X \mapsto TA] $$ $$ f' = \sigma f $$
```
// The context contains the following types: Base, Sub, and Sub <: Base.
class F<X> {
   static func foo<Y>(a: X, b: Y) {}  // foo1
   static func foo<Z>(a: Sub, b: Z) {}  // foo2   
}

/* The functions that enter the candidate set are foo1 and partial instantiated  foo2: foo<Y>(a:Base, b: Y) and foo<Z>(a: Sub, b: Z) */
var f = F<Base>.foo(Sub(), Base())  // foo2  
```
The format of the call expression is obj.f(A), and obj is an instance of the instantiated type of the generic type. That is, f is a non-static member function of a generic type.

In the expression, the type of obj needs to be determined first, and then the candidate set function is chosen based on the type of obj. The type of obj is the instantiated type. The type check of function call for the static member function of the instantiated type is performed. The function enters the candidate set after it passes the check.
```
// The context contains the following types: Base, Sub, and Sub <: Base.
class C<T, U>{
    init (a: T, b: U) {}
    func foo(a: T, b: U) {}     // foo1
    func foo(a: Base, b: U) {}   // foo2
}

/* It is inferred that the type of obj is C<Sub, Rune>.
   The functions that enter the candidate set are instantiated foo1, foo2:   
   foo(a:Sub, b:Rune) and foo(a: Base, b: Rune)
 */
main() {
    C(Sub(), 'a').foo(Sub(), 'a')   // choose foo1
    return 0
}
```

2. If no type parameter is provided when the function is called, that is, the function is called in the form of f(a), the generic function f that enters the candidate set must meet the following requirements:

Assuming f is a generic function in the format of f<X_1,...,X_m>(p1: T_1, ..., pn: T_n): R. No type parameter is provided in the call expression, so the format is f(a_1, ..., a_n). f can infer a group of type parameters TA_1, ...,TA_m based on the parameter type (A1, ...,An), which meet all generic constraints of f. In addition, when X_1, ...,X_m are replaced with TA_1, ...,TA_m in f, it can pass the type check of function call. The check rules are as follows:
- After the inferred type parameter TA_1, ...,TA_m are substituted into the function parameter (T1, ...,Tn) of f, (A1, ...,An) are the subtype of the substituted parameter type in the context where the call expression is located.
$$ \sigma = [X_1 \mapsto TA_1, ..., X_m \mapsto TA_m] $$ $$ \Delta \vdash (A1, ..., An) <: \sigma (T_{1},...,T_{n}) $$
- If the call expression provides the return type RA, type check needs to be performed based on the return type. After X_1, ...,X_m in the return type R of f are replaced with TA_1, ...,TA_m, the substituted return type is the subtype of RA in the context where the call expression is located.

$$ \sigma = [X_1 \mapsto TA_1, ..., X_m \mapsto TA_m] $$ $$ \Delta \vdash \sigma R <: RA $$

3. If type parameters are provided during function call, that is, the function call format is f<TA>(a), the f that enters the candidate set must meet the following requirements:

The number of type parameters of f is the same as that of TA. The type parameter TA meets the generic constraint of f. After the type parameters are substituted, the function can pass the type check of function call.