Processing UTF-8 data

To process UTF-8 data, first convert the UTF-8 data to UTF-16 in a national data item. After processing the national data, convert it back to UTF-8 for output. For the conversions, use the intrinsic functions NATIONAL-OF and DISPLAY-OF, respectively. Use code page 1208 for UTF-8 data.

National data is encoded in UTF-16, which uses one encoding unit for almost all commonly encountered characters. With this property, you can use string operations such as reference modification on the national data. If it is more convenient to retain the UTF-8 encoding, use the Unicode intrinsic functions to assist with processing the data. For details, see Using intrinsic functions to process UTF-8 encoded data.

Take the following steps to convert ASCII or EBCDIC data to UTF-8:

  1. Use the function NATIONAL-OF to convert the ASCII or EBCDIC string to a national (UTF-16) string.
  2. Use the function DISPLAY-OF to convert the national string to UTF-8.

The following example converts Greek EBCDIC data to UTF-8:

This image shows sample code for converting Greek EBCDIC data to UTF-8.Link to detail.

Usage note: Use care if you use reference modification to refer to data encoded in UTF-8. UTF-8 characters are encoded with a varying number of bytes per character. Avoid operations that might split a multibyte character.