ILE C/C++ Programmer's Guide

GB18030 Code Page Support

The Chinese government requires support of the GB18030 code page in all products sold in China. GB18030 is a Chinese standard that specifies a code page and a mapping table to Unicode.

The key points in this standard are:

This means that GB18030 characters can be represented using Unicode, we use:

The basic character set that the compiler processes is UTF-8.

The user source does not have to be encoded in UTF-8 because the compiler converts it into UTF-8 for the internal processing.

Generating Wide Characters and String Literals in UTF-32

To generate wide characters and string literals in UTF-32, use the LOCALETYPE(*LOCALEUTF) option with:

Considerations

When LOCALETYPE(*LOCALEUTFp) is specified, wide-character literals are always represented in UTF-32 format regardless of the CCSID used by the root source file. In addition, #pragma convert() has no effect on the wide-character literals.

The LOCALETYPE(*LOCALEUTF) option requires that the target CCSID be 1208. When a default or specified target CCSID does not map to 1208:

A new-line character ('\n') is converted to the value 0x0a regardless of the SYSIFCOPT option ( *IFS64IO || *IFSIO || *NOIFSIO).

Translation of narrow characters includes values above the basic character set. An example of such character is '¬', which has the 2-byte value 0xC2AC in UTF-8.

C++ language onlyWhen the LOCALETYPE(*LOCALEUTF) option is specified, the compiler predefines 'wchar_t' as an unsigned integer with 4-byte size and alignment (otherwise 'wchar-t' remains an unsigned short integer with 2-byte size and alignment).

C language onlyWhen the LOCALETYPE(*LOCALEUTF) option is specified, the definition for an unsigned integer with 4-byte size and alignment is used. This definition is provided in <stdlib.h>.


[ Top of Page | Previous Page | Next Page | Table of Contents ]