String comparisons

Binary string comparisons

Binary string comparisons always use a sort sequence of *HEX and the corresponding bytes of each string are compared. Additionally, two binary strings are equal only if the lengths of the two strings are identical. If the strings are equal up to the length of the shorter string length, the shorter string is considered less than the longer string even when the remaining bytes in the longer string are hexadecimal zeros. Note that binary strings cannot be compared to character strings unless the character string is cast to a binary string.

Character and graphic string comparisons

Character and Unicode graphic string comparisons use the sort sequence in effect when the statement is executed for all SBCS data and the single-byte portion of mixed data. If the sort sequence is *HEX, the corresponding bytes of each string are compared. For all other sort sequences, the corresponding bytes of the weighted value of each string are compared.

If the strings have different lengths, a temporary copy of the shorter string is padded on the right with blanks before comparison. The padding makes each string the same length. The pad character is always a blank, regardless of the sort sequence. For bit data, the pad character is also a blank. For DBCS graphic data, the pad character is a DBCS blank (x'4040'). For Unicode graphic data, the pad character is a UTF-16 blank. 24

Two strings are equal if any of the following are true:

An empty string is equal to a blank string. The relationship between two unequal strings is determined by a comparison of the first pair of unequal bytes (or bytes of the weighted value) from the left end of the string. This comparison is made according to the sort sequence in effect when the statement is executed.

In an application that will run in multiple environments, the same sort sequence (which depends on the CCSIDs of the environments) must be used to ensure identical results. The following table illustrates the differences between EBCDIC, ASCII, and the DB2 UDB LUW default sort sequence for United States English by showing a list that is sorted according to each one.

Table 17. Sort Sequence Differences
ASCII and Unicode EBCDIC DB2 UDB LUW Default
0000 @@@@ 0000
9999 co-op 9999
@@@@ coop @@@@
COOP piano forte co-op
PIANO-FORTE piano-forte COOP
co-op COOP coop
coop PIANO-FORTE piano forte
piano forte 0000 PIANO-FORTE
piano-forte 9999 piano-forte

Two varying-length strings with different lengths are equal if they differ only in the number of trailing blanks. In operations that select one value from a set of such values, the value selected is arbitrary. The operations that can involve such an arbitrary selection are DISTINCT, MAX, MIN, UNION, EXCEPT, INTERSECT, and references to a grouping column. See the description of GROUP BY for further information about the arbitrary selection involved in references to a grouping column.

Conversion rules for comparison

When two strings are compared, one of the strings is first converted, if necessary, to the coded character set of the other string. Character conversion is necessary only if all of the following are true:

If two strings with different encoding schemes are compared, any necessary conversion applies to the string as follows:

Table 18. Selecting the Encoding Scheme for Character Conversion
First Operand Second Operand
SBCS Data DBCS Data Mixed Data Unicode Graphic Data
SBCS Data see below second second second
DBCS Data first see below second second
Mixed Data first first see below second
Unicode Graphic Data first first first see below

Otherwise, the string selected for conversion depends on the type of each operand. The following table shows which operand is selected for conversion, given the operand types:

Table 19. Selecting the Operand for Character Conversion
First Operand Second Operand
Column Value Derived Value Special Register Constant Variable
Column Value second second second second second
Derived Value first second second second second
Special Register first first second second second
Constant first first first second second
Variable first first first first second

A variable that contains data in a foreign encoding scheme is always effectively converted to the native encoding scheme before it is used in any operation. The above rules are based on the assumption that this conversion has already occurred.

An error is returned if a character of the string cannot be converted or the CCSID Conversion Selection Table (Coded character sets and CCSIDs) is used but does not contain any information about the pair of CCSIDs. A warning occurs if a character of the string is converted to the substitution character.


24.
UTF-16 defines a blank character at code point X'0020' and X'3000'. The database manager pads with the blank at code point X'0020'.