This topic describes Unicode considerations for positional entries and keyword entries for printer files. This topic also describes the CCSID keyword for Unicode data in printer files.
Unicode is a universal encoding scheme for written characters and text that enables the exchange of data internationally. A Unicode field can contain all types of characters used on an iSeries server, including ideographic (DBCS) characters. The term code unit is used in this topic to mean the minimal bit combination that can represent a unit of encoded text for processing or interchange.
DDS printer files support two transformation formats (encoding forms) of Unicode:
A UTF-16 code unit is 2 bytes in length. A UTF-16 character can be 1 or 2 code units (2 or 4 bytes) in length. A UTF-16 data string can contain any character, including UTF-16 surrogates and combining characters.
UCS-2 is a subset of UTF-16 and can no longer support all of the characters defined by Unicode. UCS-2 is identical to UTF-16 except that UTF-16 also supports combining characters and surrogates. If you do not need combining characters and surrogates, you might choose to use UCS-2.
(C) Copyright IBM Corporation 1992, 2005. All Rights Reserved.