The Chinese government requires support of the GB18030 code page in all products sold in China. GB18030 is a Chinese standard that specifies a code page and a mapping table to Unicode.
The key points in this standard are:
This means that GB18030 characters can be represented using Unicode, we use:
The basic character set that the compiler processes is UTF-8.
The user source does not have to be encoded in UTF-8 because the compiler converts it into UTF-8 for the internal processing.
To generate wide characters and string literals in UTF-32, use the LOCALETYPE(*LOCALEUTF) option with:
When LOCALETYPE(*LOCALEUTFp) is specified, wide-character literals are always represented in UTF-32 format regardless of the CCSID used by the root source file. In addition, #pragma convert() has no effect on the wide-character literals.
The LOCALETYPE(*LOCALEUTF) option requires that the target CCSID be 1208. When a default or specified target CCSID does not map to 1208:
A new-line character ('\n') is converted to the value 0x0a regardless of the SYSIFCOPT option ( *IFS64IO || *IFSIO || *NOIFSIO).
Translation of narrow characters includes values above the basic character set. An example of such character is '¬', which has the 2-byte value 0xC2AC in UTF-8.
When the LOCALETYPE(*LOCALEUTF) option is specified, the compiler predefines
'wchar_t' as an unsigned integer with 4-byte size and
alignment (otherwise 'wchar-t' remains an unsigned short integer
with 2-byte size and alignment).
When the LOCALETYPE(*LOCALEUTF) option is specified, the definition for an
unsigned integer with 4-byte size and alignment is used. This
definition is provided in <stdlib.h>.
(C) Copyright IBM Corporation 1992, 2005. All Rights Reserved.