This appendix describes Unicode considerations for the positional entries and keyword entries for database (physical and logical) files.
Unicode is a universal encoding scheme for written characters and text that enables the exchange of data internationally. A Unicode field can contain all types of characters used on an iSeries(TM) server, including ideographic (DBCS). Unicode data is composed of code units, which represent the minimal byte combination that can represent a unit of text.
There are three transformation formats (encoding forms) of Unicode that are supported with physical and logical file DDS:
A UTF-8 code unit is 1 byte in length. A UTF-8 character can be 1, 2, 3, or 4 code units in length. A UTF-8 data string can contain any character, including surrogates and combining characters.
A UTF-16 code unit is 2 bytes in length. A UTF-16 character can be 1 or 2 code units (2 or 4 bytes) in length. A UTF-16 data string can contain any character, including UTF-16 surrogates and combining characters.
UCS-2 is a subset of UTF-16, and can no longer support all of the characters defined by Unicode. UCS-2 is identical to UTF-16, except that UTF-16 also supports combining characters and surrogates. If you do not need support for combining characters and surrogates, then you can choose to use the UCS-2 type, because there is more database functionality available for it.
In the following topics, references to UTF-16 imply UCS-2 as well.
Positional and keyword entry considerations for database files that use Unicode:
The following topics describe how to specify DDS position 30 through 37 and position 45 through 80 for describing database files. Positions not mentioned have no special considerations for Unicode.
(C) Copyright IBM Corporation 1992, 2005. All Rights Reserved.