UPOS
The UPOS function returns an integer value
that is equal to the index of the nth UTF-8 or
UTF-16 character in a character data item argument that contains UTF-8
or UTF-16.
The function type is integer.
- argument-1
- Must be of class
alphabetic, alphanumeric, or national
.
argument-1 must contain valid UTF-8 or UTF-16 encoded characters:
- If argument-1 is of class alphabetic or alphanumeric, it must contain valid UTF-8 data.
- If argument-1 is of class national, it must contain valid UTF-16 data.
- argument-2
- Must be an integer.
Suppose argument-1 is alphabetic or
alphanumeric and argument-2=n,
the returned value is the byte position of the nth
UTF-8 character in argument-1. Suppose argument-1 is
a national data item and argument-2=n,
the returned value is the byte position of the nth
UTF-16 character in argument-1.
If argument-2 is
not positive or if argument-2 is larger than ULENGTH(argument-1),
zero is returned. Otherwise, if argument-2=n,
the returned value is the byte position in argument-1 where
the nth UTF-8 or
UTF-16
character starts.
Example 1
If A is an alphanumeric item that contains the UTF-8 value x'4BC3A4666572' ('Käfer'), the returned values are as follows:
- UPOS(A 1) returns 1
- UPOS(A 2) returns 2
- UPOS(A 3) returns 4
- UPOS(A 4) returns 5
- UPOS(A 5) returns 6

Example 2
If B is a national item that contains the UTF-16 value x'005400F6006200750072D858DC6B0073' ('Töber𦁫s'), the returned values are as follows:
- UPOS (B 1) returns 1
- UPOS (B 2) returns 3
- UPOS (B 3) returns 5
- UPOS (B 4) returns 7
- UPOS (B 5) returns 9
- UPOS (B 6) returns 11
- UPOS (B 7) returns 15


Example 3
If argument-1 is a UTF-8 encoded item and the UTF-8 argument contains composed characters, the combining characters are counted individually. For example, when encoded in UTF-8, the Unicode character ä can be x'C3A4' or x'61CC88'. With either of the UTF-8 characters in argument-1, the returned values of the UPOS function are different. See the following table for details.
argument-1 | Unicode encoding | UTF-8 encoding | Returned values of the UPOS function |
---|---|---|---|
C = äK | U+00E4 + U+004B
(precomposed form,
latin small letter a with diaeresis + latin capital letter K) |
x'C3A44B' (äK) | UPOS(C 1) returns 1 UPOS(C 2) returns 3 UPOS(C 3) returns 0 |
U+0061 + U+0308 + U+004B
(canonical decomposition,
latin small letter a + combining diaeresis + latin capital letter K) |
x'61CC884B' (äK) | UPOS(C 1) returns 1 UPOS(C 2) returns 2 UPOS(C 3) returns 4 |
