Processing UTF-8 data
To process UTF-8 data, first convert the UTF-8 data to
UTF-16 in a national data item. After processing the national data,
convert it back to UTF-8 for output. For the conversions, use the
intrinsic functions NATIONAL-OF
and DISPLAY-OF
,
respectively. Use code page 1208 for UTF-8 data.
National data is encoded in UTF-16, which uses one encoding unit for almost all commonly encountered characters. With this property, you can use string operations such as reference modification on the national data. If it is more convenient to retain the UTF-8 encoding, use the Unicode intrinsic functions to assist with processing the data. For details, see Using intrinsic functions to process UTF-8 encoded data.
Take the following steps to convert ASCII or EBCDIC data to UTF-8:
- Use the function
NATIONAL-OF
to convert the ASCII or EBCDIC string to a national (UTF-16) string. - Use the function
DISPLAY-OF
to convert the national string to UTF-8.
The following example converts Greek EBCDIC data to UTF-8:
Usage note: Use care if you use reference modification to refer to data encoded in UTF-8. UTF-8 characters are encoded with a varying number of bytes per character. Avoid operations that might split a multibyte character.