General Description

The following characteristics help define the rules used by DBCS to represent extended characters:

Note:
In EBCDIC, the shift-out (SO) and shift-in (SI) characters distinguish DBCS characters from SBCS characters.

Enabling DBCS Data Operations and Symbol Use

The OPTIONS instruction controls how REXX regards DBCS data. To enable DBCS operations, use the EXMODE option. To enable DBCS symbols, use the ETMODE option on the OPTIONS instruction; this must be the first instruction in the program. (See page OPTIONS for more information.)

If OPTIONS ETMODE is in effect, the language processor does validation to ensure that SO and SI are paired in comments. Otherwise, the contents of the comment are not checked. The comment delimiters (/* and */) must be SBCS characters.

Symbols and Strings

In DBCS, there are DBCS-only symbols and strings and mixed symbols and strings.

DBCS-Only Symbols and Mixed SBCS/DBCS Symbols

A DBCS-only symbol consists of only non-blank DBCS codes as indicated in Table 5.

A mixed DBCS symbol is formed by a concatenation of SBCS symbols, DBCS-only symbols, and other mixed DBCS symbols. In EBCDIC, the SO and SI bracket the DBCS symbols and distinguish them from the SBCS symbols.

The default value of a DBCS symbol is the symbol itself, with SBCS characters translated to uppercase.

A constant symbol must begin with an SBCS digit (0-9) or an SBCS period. The delimiter (period) in a compound symbol must be an SBCS character.

DBCS-Only Strings and Mixed SBCS/DBCS Strings

A DBCS-only string consists of only DBCS characters. A mixed SBCS/DBCS string is formed by a combination of SBCS and DBCS characters. In EBCDIC, the SO and SI bracket the DBCS data and distinguish it from the SBCS data. Because the SO and SI are needed only in the mixed strings, they are not associated with the DBCS-only strings.

In EBCDIC:

DBCS-only string      ->      .A.B.C
Mixed string          ->      ab<.A.B>
Mixed string          ->      <.A.B>
Mixed string          ->      ab<.C.D>ef

Validation

The user must follow certain rules and conditions when using DBCS.

DBCS Symbol Validation

DBCS symbols are valid only if you comply with the following rules:

These examples show some possible misuses:

<.A.BC>        ->  Incorrect because of odd byte length
<.A.B><.C>     ->  Incorrect contiguous SO/SI
<>             ->  Incorrect contiguous SO/SI (null DBCS symbol)
<.A<.B>.C>     ->  Incorrectly nested SO/SI
<.A.B.C        ->  Incorrect because SO/SI not paired
<.A. .B>       ->  Incorrect because contains blank
'. A<.B><.C>   ->  Incorrect symbol

Mixed String Validation

The validation of mixed strings depends on the instruction, operator, or function. If you use a mixed string with an instruction, operator, or function that does not allow mixed strings, this causes a syntax error.

The following rules must be followed for mixed string validation:

EBCDIC only:

These examples show some possible misuses:

'ab<cd'       ->    INCORRECT - not paired
'<.A<.B>.C>   ->    INCORRECT - nested
'<.A.BC>'     ->    INCORRECT - odd byte length

The end of a comment delimiter is not found within DBCS character sequences. For example, when the program contains /* < */, then the */ is not recognized as ending the comment because the scanning is looking for the > (SI) to go with the < (SO) and not looking for */.

When a variable is created, modified, or referred to in a REXX program under OPTIONS EXMODE, it is validated whether it contains a correct mixed string or not. When a referred variable contains a mixed string that is not valid, it depends on the instruction, function, or operator whether it causes a syntax error.

The ARG, PARSE, PULL, PUSH, QUEUE, SAY, TRACE, and UPPER instructions all require valid mixed strings with OPTIONS EXMODE in effect.

Instruction Examples

Here are some examples that illustrate how instructions work with DBCS.

PARSE

In EBCDIC:

x1 = '<><.A.B><. . ><.E><.F><>'

PARSE VAR x1 w1
           w1   ->   '<><.A.B><. . ><.E><.F><>'

PARSE VAR x1 1 w1
           w1   ->   '<><.A.B><. . ><.E><.F><>'

PARSE VAR x1 w1 .
           w1   ->   '<.A.B>'

The leading and trailing SO and SI are unnecessary for word parsing and, thus, they are stripped off. However, one pair is still needed for a valid mixed DBCS string to be returned.

PARSE VAR x1 . w2
           w2   ->   '<. ><.E><.F><>'

Here the first blank delimited the word and the SO is added to the string to ensure the DBCS blank and the valid mixed string.

PARSE VAR x1 w1 w2
           w1   ->   '<.A.B>'
           w2   ->   '<. ><.E><.F><>'

PARSE VAR x1 w1 w2 .
           w1   ->   '<.A.B>'
           w2   ->   '<.E><.F>'

The word delimiting allows for unnecessary SO and SI to be dropped.

x2 = 'abc<>def <.A.B><><.C.D>'

PARSE VAR x2 w1 '' w2
           w1   ->   'abc<>def <.A.B><><.C.D>'
           w2   ->   ''

PARSE VAR x2 w1 '<>' w2
           w1   ->   'abc<>def <.A.B><><.C.D>'
           w2   ->   ''

PARSE VAR x2 w1 '<><>' w2
           w1   ->   'abc<>def <.A.B><><.C.D>'
           w2   ->   ''

Note that for the last three examples '', <>, and <><> are each a null string (a string of length 0). When parsing, the null string matches the end of string. For this reason, w1 is assigned the value of the entire string and w2 is assigned the null string.

PUSH and QUEUE

The PUSH and QUEUE instructions add entries to the program stack. Since a stack entry is limited to 255 bytes, the expression must be truncated less than 256 bytes. If the truncation splits a DBCS string, REXX will insure that the integrity of the SO-SI pairing will be kept under OPTIONS EXMODE.

SAY and TRACE

The SAY and TRACE instructions write data to the output stream. As was true for the PUSH and QUEUE instructions, REXX will guarantee the SO-SI pairs are kept for any data that is separated to meet the requirements of the output stream. The SAY and TRACE instructions display data on the user's terminal. As was true for the PUSH and QUEUE instructions, REXX will guarantee the SO-SI pairs are kept for any data that is separated to meet the requirements of the terminal line size. This is generally 130 bytes or fewer if the DIAG-24 value returns a smaller value.

When the data is split up in shorter lengths, again the DBCS data integrity is kept under OPTIONS EXMODE. In EBCDIC, if the terminal line size is less than 4, the string is treated as SBCS data, because 4 is the minimum for mixed string data.

UPPER

Under OPTIONS EXMODE, the UPPER instruction translates only SBCS characters in contents of one or more variables to uppercase, but it never translates DBCS characters. If the content of a variable is not valid mixed string data, no uppercasing occurs.