Identifiers
In Cangjie, developers can assign names to various program elements, known as identifiers.
Before learning identifiers, you need to understand some Unicode character set concepts. In the Unicode standard, the XID_Start and XID_Continue properties are used to mark the start character and subsequent characters that can be used as Unicode identifiers. For their definitions, see Unicode Standard Document. XID_Start may contain Chinese and English characters, and XID_Continue may contain Chinese characters, English characters, and Arabic numerals. Cangjie uses the Unicode standard 15.0.0.
Identifiers in Cangjie are classified into common identifiers and raw identifiers, which comply with different naming rules.
A common identifier cannot be a keyword in Cangjie. It must be formed from one of the following two types of character sequences:
- Starting with an
XID_Startcharacter, followed by XID_Continue characters of any length. - Starting with
_, followed by at least oneXID_Continuecharacter.
For definitions of XID_Start and XID_Continue, see Unicode Standard. Cangjie uses the Unicode standard 15.0.0.
In Cangjie, all identifiers are identified as Normalization Form C (NFC). If two identifiers are equal after normalization to NFC, they are considered to be the same.
For example, all of the following strings are valid common identifiers:
abc
_abc
abc_
a1b2c3
a_b_c
a1_b2_c3
仓颉
__こんにちは
The following character strings are invalid common identifiers:
```text
ab&c // & is not an XID_Continue character.
3abc //The Arabic numeral is not an XID_Start character. Therefore, the numeral cannot be used as the start character.
_ //At least one XID_Continue character is required after the underscore (_).
while // while is a keyword in Cangjie and therefore cannot be used as a common identifier.
A raw identifier is a common identifier or Cangjie keyword enclosed in a pair of backticks (`). Typically, it is needed when you use Cangjie keywords as identifiers.
For example, all of the following strings are valid raw identifiers:
`abc`
`_abc`
`a1b2c3`
`if`
`while`
`à֮̅̕b`
In each of the following character strings, the part enclosed in backticks is an invalid common identifier. Therefore, they are all invalid raw identifiers.
`ab&c`
`3abc`