Generics
If a declaration uses angle brackets to declare type parameters, it is called a generic declaration. When generic declarations are used, type parameters can be replaced with other types. You can define a generic function by declaring type parameters in a function signature. You can define a generic type by declaring type parameters in the definitions of class, interface, struct, enum, or type alias.
Type Parameters and Type Variables
In Cangjie, type parameters are represented by identifiers enclosed in <>
. You can provide multiple type parameters by separating them with ,
inside the <>
, such as<T1, T2, T3>
, where T1, T2, T3
are type parameters.
Once type parameters are declared, they can be used as types.
When identifiers are used to reference declared type parameters, these identifiers are called type variables.
The syntax of type parameters is as follows:
typeParameters
: '<' identifier (',' identifier)* '>'
;
Generic Constraints
In Cangjie, <:
can be used to indicate that one type is a subtype of another. By declaring this relationship, you can impose constraints on generic type parameters, ensuring they can only be replaced with types that meet specific constraints.
Generic constraints are declared by the operator <:
following where
and consists of a lower bound and an upper bound. The part on the left of <:
is referred to as the lower bound of the constraint, which can only be a type variable. The part on the right of <:
is referred to as the upper bound of the constraint, which can be a type. When the upper bound is a type, the constraint is a subtype constraint. The upper bound can be of any type.
A type variable may be subject to multiple upper bound constraints at the same time. Multiple upper bounds of the same type parameter must be connected using &
, which simplifies cases where a type parameter has several upper bounds. In essence, the type variable has multiple generic constraints. Constraints for different type variables should be separated by ,
.
The syntax for declaring a constraint is as follows:
upperBounds
: type ('&' type)*
;
genericConstraints
: 'where' identifier '<:' upperBounds (',' identifier '<:' upperBounds)*
;
The following example shows how to declare a generic constraint.
interface Enumerable<U> where U <: Bounded {...}
interface Comparable<T> {...}
func collectionCompare<T>(a: T, b: T) where T <: Comparable<T> & Seqence {...}
func sort<T, V>(input: T)
where T <: Enumerable<V>, V <: Object {...}
If the constraints of type variables X
and Y
are the same, it means that all types that meet the constraints of X
meet those of Y
, and all types that meet the constraints of Y
also meet those of X
.
If the constraints of the type variable X
are stricter than those of Y
, it means that all types that meet the constraints of X
meet those of Y
, but not the other way around.
If the constraints of X
are stricter than those of Y
, the constraints of Y
are looser than those of X
.
If two generic types have the same constraints, it means that they have the same number of type variables and all corresponding type variable constraints are identical.
If the constraint of a generic type A
is stricter than that of another generic type B
, it means that the number of type variables of A
is the same as that of B
, but the constraints of all type variables of A
are stricter than those in B
.
If the constraint of a generic type A
is stricter than that of another generic type B
, the constraint of B
is looser than that of A
.
For example, in the following two generics C
and D
, if I1 <: I2
, then the constraint of C
is stricter than that of D
:
class C<X, Y> where X <: I1, Y <: I1
and class D<X, Y> where X <: I2, Y <: I2
Type Variance
Before moving on to generic functions and generic types, let's learn about the following type variance to comprehend the subtype relationships of generic types in Cangjie.
Definition
If A
and B
are types, and T
is a type constructor with a type parameter X
, then:
-
When
T(A) <: T(B)
if and only ifA <: B
,T
is covariant at X. -
When
T(A) <: T(B)
if and only ifB <: A
,T
is contravariant at X. -
When
T(A) <: T(B)
if and only ifA = B
,T
is invariant.
Generic Invariance
In Cangjie, all generics are invariant. This means that if A
is a subtype of B
, there is no subtype relationship between ClassName<A>
and ClassName<B>
. The purpose is to ensure runtime safety.
Variance of Function Types
The parameter types of functions are contravariant, while the return types of functions are covariant. Suppose there is a function f1
with the type S1 -> T1
, and another function f2
with the type S2 -> T2
. If S2 <: S1
and T1 <: T2
, then the type of f1
is a subtype of the type of f2
.
Covariance of Tuple Types
There is a subtype relationship between tuples. If each element of one tuple is a subtype of the corresponding element in another tuple, then the first tuple is a subtype of the second one. Suppose there are tuples Tuple1
and Tuple2
, with types (A1, A2, ..., An)
and (B1, B2, ..., Bn)
. If for all i
, it holds that Ai <: Bi
, then Tuple1 <: Tuple2
.
Restrictions on Variance
Currently, variance relationships are prohibited in two scenarios:
- Interfaces implemented by types other than class. The subtype relationship between such types and their interfaces cannot serve as a basis for covariance and contravariance.
- Interfaces implemented by the implementation types through extensions. The subtype relationship between such types and their interfaces cannot serve as a basis for covariance and contravariance.
These restrictions affect not only variance relationships but also the determination of override for subtype.
A type that is not applicable to the type variance relationship cannot be used as the basis for subtypes when overriding occurs.
interface I {
func f(): Any
}
class Foo <: I {
func f(): Int64 { // error
...
}
}
Constraints on Upper Bounds
For a constraint L <: T<T1..Tn>
, its upper bound is T<T1..Tn>
. The declaration of T
's type parameters may need to meet some constraints. After the argument Ti
is replaced, these constraints must be implicitly introduced into the current context of the declaration. For example:
interface Eq<T> {
func eq(other: T): Bool
}
interface Ord<T> where T <: Eq<T> {
func lt(other: T): Bool
}
func foo<T>(a: T) where T <: Ord<T> {
a.eq(a)
a.lt(a)
}
For the foo
function, even though only T
is constrained by Ord
, if Ord
's T
type is constrained by Eq
, then the eq
function from Eq
can be used within the foo
function. In this way, the constraint of the foo
function is actually T <: Eq & Ord
. Therefore, when a generic parameter meeting a constraint is declared, the constraints on the upper bounds are also introduced.
This rule of implicitly introducing constraints on upper bounds is also applicable to other generic declarations. For example:
interface A {}
class B<T> where T <: A {}
class C
Note: Although the constraints of the upper bound in the current declaration are implicitly introduced, the current declaration can still explicitly write these constraints.
## Definitions of Generic Functions and Generic Types
### Generic Functions
If a function declares one or more type parameters, it is called a generic function. In terms of syntax, the type parameters follow the function name and are enclosed in `<>`. If there are multiple type parameters, they are separated by `,`.
```cangjie
func f<T>() {...}
func f1<T1, T2, T3, ...>() {...}
The syntax of a generic function is as follows:
functionDefinition
: functionModifierList? 'func' identifier
typeParameters functionParameters
(':' type)? genericConstraints?
block?
;
Note that:
- When
<
and>
are used, they are parsed as generics first. If they are parsed successfully, they are generic expressions. Otherwise, they are comparison operators. For example:
(c <d , e> (f))
This expression is parsed as a function call expression.
Generic Types
If a class
, interface
, struct
, enum
, or type alias definition declares one or more type parameters, it is called a generic type. In syntax, type parameters follow the type name (such as class name or interface name) and are enclosed by <>
. If there are multiple type parameters, they are separated by ,
.
class C<T1, T2> {...} // generic class
interface I<T1> {...} // generic interface
type BinaryOperator<T> = (T, T) -> T // generic typealias
For the syntax of generic declarations, see the corresponding sections.
Generic Type Check
Checking Generic Declarations
Sanity Check for Generic Constraints
For all type parameters in a declaration, the upper bound of each type parameter can be classified into two cases:
-
The upper bound is also a type variable, which may be itself or another type variable.
-
When the upper bound is of a specific type, it can be further divided into two situations:
-
The first situation involves upper bounds that are class or interface types, which are referred to as class-related types.
-
The second situation involves upper bounds that are not class or interface types, which are referred to as class-independent types.
In Cangjie, the upper bounds of one or more specific types for a type variable must meet the following rules:
-
All upper bounds can only be the same case; they must either all be class-related types or class-independent types. For example,
T <: Object & Int32
is invalid. -
When the upper bound is a class-related type, if there are multiple upper bounds of classes, these classes must be on the same inheritance chain. This restriction does not apply to interfaces. Conflicting member definitions are not allowed in multiple generic upper bounds of a type variable. Specifically, conflicts refer to functions with the same name or operators that do not constitute overloads and there is no subtype relationship between return types.
-
When the upper bound is a class-independent type, only one class-independent specific type can be included. Two different types are not allowed. For example,
T <: Int32 & Bool
is invalid. -
Recursive constraints cannot exist when the upper bound of a type variable is a class-independent type. A recursive generic constraint is one where the upper bound's type parameter directly or indirectly depends on the lower bound type variable. For example, the recursive generic constraint
T <: Option<T>
is invalid becauseOption
is declared using theenum
keyword. Similarly,T <: U, U <: (Int32) -> T
is invalid because the function is of the value type and theT
type indirectly depends on itself throughU
.
-
Type Compatibility Check
The purpose of type checking for generic declarations is to ensure that the generic type is compatible with its surrounding type context and the access to member functions and variables is valid. For example:
open class C {
func coo() {...}
}
class D <: C {...}
interface Tr {
func bar(): Int64
}
func foo<T>(a: T) {
var b: C = a // error, T is not a subtype of C
a.coo() // error, T has no member function coo
a.bar() // error, T did not implement Tr
}
Three errors are reported in the foo
function body of the preceding sample code. The causes are as follows:
-
The expected type of the variable
b
declared in thefoo
function isC
. Therefore, you need to check whetherT
is a subtype ofD
, that is,T <: C
. However, this constraint does not exist in the context ofT
. As a result, an error is reported during compilation of the variable declaration. -
The generic type
T
is irrelevant toC
in the current context. Therefore, the member functioncoo
ofC
cannot be accessed. -
Similarly, the
T
type does not have the constraint ofTr
. Therefore, the member functionbar
ofTr
cannot be accessed.
To pass the type check, add a generic constraint before the declaration body.
open class C {
func coo() {...}
}
interface Tr {
func bar(): Int64
}
func foo<T>(a: T) where T <: C & Tr {
var b: C = a // OK, T is a sub type of C now
a.coo() // OK, T is a sub type of C, so it has coo member function
a.bar() // OK, T is constrained by Tr
}
In particular, if a generic upper bound of a type variable T
includes a function type or a type that overloads the function call operator ()
, a value of type T
may be considered as a function call. When the upper bound is a function type, the return type of the call is a return type of an upper bound of T
. When the upper bound is a type that overloads the function call operator ()
, the return type of the call is a return type of a matched function call operator in the upper bound type.
Checking the Use of Generic Declarations
The primary purpose of checking the use of generic declarations is to substitute the arguments into the generic declaration's type parameters, and then verify whether the constraints are valid.
If the C
type is used to call the foo
function defined in the previous section:
main(): Int64 {
foo<C>(C()) // error C does not implement Tr
return 0
}
In this case, an error is reported indicating that type C
does not implement Tr
. This is because the constraints in the foo
function specify T <: C & Tr
. When T
is substituted with C
, C <: C
is valid, but C <: Tr
is invalid.
If the declaration of implementing Tr
is added to the C
type, the T <: Tr
constraint can be met.
extend C <: Tr {
func bar(): Int64 {...}
}
In particular, when an interface is used as a generic constraint, the type instantiated for the generic variable must fully implement the interface static function in all upper bound constraints.
This means that if a static function exists in an interface that is used as a generic constraint, the interface or abstract class that does not implement the corresponding static function cannot be used as the type instantiated for the generic variable.
interface I {
static func f(): Unit
}
func g<T>(): Unit where T <: I {}
main() {
g<I>() // error
return 0
}
Depth of Generic Instantiation
To prevent infinite loops or memory exhaustion during generic instantiation, the number of instantiation layers is limited during compilation. For example:
class A<T>{
func test(a: A<(A<T>, A<T>)>): Bool {true}
}
main(): Int64 {
var a : A<Int32> = A<Int32>()
return 0
}
This code throws an infinite instantiation
error.
Generic Instantiation
The process of forming a corresponding non-generic language structure from a generic declaration after all type parameters are determined is called generic instantiation.
Instantiation of Generic Functions
func show<T>(a: T) where T <: ToString {
a.toString()
}
After T
is substituted with Int32
, the following instance is generated (assuming that show$Int32
is the internal representation after compiler instantiation; similar representation is used in the following):
func show$Int32(a: Int32) {
a.toString()
}
Restrictions on Instantiating Generic Functions
In Cangjie, generic functions cannot be declared in the following cases:
- Non-static abstract functions in interfaces and abstract classes
- Instance member functions in classes and abstract classes marked with the open keyword
- Operator overloading functions
For example, the declaration and definition of the following functions are invalid.
abstract class AbsClass {
public func foo<T>(a: T): Unit // error: abstract generic function in abstract class
public open func bar<T>(a: T) { // error: open generic function in abstract class
...
}
}
interface IF {
func foo<T>(a: T): Unit // error: abstract generic function in interface
}
open class Foo {
public open func foo<T>(a: T): Unit { // error: open generic function in class
...
}
}
The following generic functions are valid.
class Foo {
static func foo<T>(a: T) {...} // generic static function in class
func bar<T>(a: T) {...} // generic non-open function in class
}
abstract class Bar {
func bar<T>(a: T) {...} // generic non-open function in abstract class
}
struct R {
func foo<T>(a: T) {...} // generic function in struct
}
enum E {
A | B | C
func e<T>(a: T) {...} // generic function in enum
}
Instantiation of Classes and Interfaces
class Foo<T> <: IBar<T>{
var a: T
init(a: T) {
this.a = a
}
static func foo(a: T) {...}
public func bar(a: T, b: Int32): Unit {...}
}
interface IBar<T> {
func bar(a: T, b: Int32): Unit
}
When T=Int32
is specified, the declaration of the following instance is generated:
class Foo$Int32 <: IBar$Int32 {
var a: Int32
static func foo(a: Int32) {...}
func bar(a: Int32) {...}
}
interface IBar$Int32 {
func bar(a: Int32, b: Int32)
}
Instantiation of Structs
The instantiation of a struct is very similar to that of a class.
struct Foo<T> {
func foo(a: T) {...}
}
When T=Int32
is specified, the declaration of the following instance is generated:
struct Foo$Int32 {
func foo(a: Int32) {...}
}
Instantiation of Enums
enum Either<T,R> {
Left(T)
| Right(R)
}
When Either
is given the parameters Int32
and Bool
, the type is instantiated as follows:
enum Either$Int32$Bool {
Left(Int32)
| Right(Bool)
}
When a generic declaration is used, for example, a generic function is called or a value of a generic type is constructed, what actually take effect are instances with all type parameters determined during compilation. That is, instantiation occurs only after all generic parameters are of a specific type.
Generic Function Overloading
Cangjie supports overloading between generic functions as well as between generic functions and non-generic functions. For details about overloading, see [Function Overloading].
When a function is called, overloading is processed as follows:
Step 1: Construct a candidate set for function calls. Functions included in this set must pass type checks and are callable. For details, see [Candidate Set of Overloaded Functions]. There are additional rules for generic functions, which will be detailed below.
Step 2: Select the best matching function based on the scope priority rules (see [Scope Priority]) and the best matching rules (see [Best Matching Rules]). If the unique best matching function is not determined, an error is reported.
Step 3: If there are multiple argument types, determine the parameter types based on the best matching function. If the unique argument type is not determined, an error is reported.
When constructing the candidate set for function calls, pay attention to the following points for generic functions:
1. When a function is called, for example, generic function f
, the candidate set may include either a partially instantiated generic function or a fully instantiated function. Which form enters the candidate set depends on the form of the call expression.
-
The call expression is in the format of
C<TA>.f(A)
, that is,f
is a static member function of a generic type. The type is instantiated first, and then the type check of function call for the static member function of the instantiated type is performed. The function enters the candidate set after it passes the check. IfC
has a type parameterX
, the function that enters the candidate set isf'
, whereX
is replaced withTA
inf
.$$ \sigma = [X \mapsto TA] $$ $$ f' = \sigma f $$
// The context contains the following types: Base, Sub, and Sub <: Base. class F<X> { static func foo<Y>(a: X, b: Y) {} // foo1 static func foo<Z>(a: Sub, b: Z) {} // foo2 } /* The functions that enter the candidate set are foo1 and partial instantiated foo2: foo<Y>(a:Base, b: Y) and foo<Z>(a: Sub, b: Z) */ var f = F<Base>.foo(Sub(), Base()) // foo2
-
The format of the call expression is
obj.f(A)
, andobj
is an instance of the instantiated type of the generic type. That is,f
is a non-static member function of a generic type.In the expression, the type of
obj
needs to be determined first, and then the candidate set function is chosen based on the type ofobj
. The type ofobj
is the instantiated type. The type check of function call for the static member function of the instantiated type is performed. The function enters the candidate set after it passes the check.// The context contains the following types: Base, Sub, and Sub <: Base. class C<T, U>{ init (a: T, b: U) {} func foo(a: T, b: U) {} // foo1 func foo(a: Base, b: U) {} // foo2 } /* It is inferred that the type of obj is C<Sub, Rune>. The functions that enter the candidate set are instantiated foo1, foo2: foo(a:Sub, b:Rune) and foo(a: Base, b: Rune) */ main() { C(Sub(), 'a').foo(Sub(), 'a') // choose foo1 return 0 }
2. If no type parameter is provided when the function is called, that is, the function is called in the form of f(a)
, the generic function f
that enters the candidate set must meet the following requirements:
-
Assuming
f
is a generic function in the format off<X_1,...,X_m>(p1: T_1, ..., pn: T_n): R
. No type parameter is provided in the call expression, so the format isf(a_1, ..., a_n)
.f
can infer a group of type parametersTA_1, ...,TA_m
based on the parameter type(A1, ...,An)
, which meet all generic constraints off
. In addition, whenX_1, ...,X_m
are replaced withTA_1, ...,TA_m
inf
, it can pass the type check of function call. The check rules are as follows:- After the inferred type parameter
TA_1, ...,TA_m
are substituted into the function parameter(T1, ...,Tn)
off
,(A1, ...,An)
are the subtype of the substituted parameter type in the context where the call expression is located.
$$ \sigma = [X_1 \mapsto TA_1, ..., X_m \mapsto TA_m] $$ $$ \Delta \vdash (A1, ..., An) <: \sigma (T_{1},...,T_{n}) $$
- If the call expression provides the return type
RA
, type check needs to be performed based on the return type. AfterX_1, ...,X_m
in the return typeR
off
are replaced withTA_1, ...,TA_m
, the substituted return type is the subtype ofRA
in the context where the call expression is located.
- After the inferred type parameter
$$ \sigma = [X_1 \mapsto TA_1, ..., X_m \mapsto TA_m] $$ $$ \Delta \vdash \sigma R <: RA $$
3. If type parameters are provided during function call, that is, the function call format is f<TA>(a)
, the f
that enters the candidate set must meet the following requirements:
- The number of type parameters of
f
is the same as that ofTA
. The type parameterTA
meets the generic constraint off
. After the type parameters are substituted, the function can pass the type check of function call.