Administration Guide

Character Sets

The database manager does not, in general, restrict the character set available to an application except as noted below.

Each combined Single-Byte Character Set (SBCS) or Double-Byte Character Set (DBCS) code page allows for both single- and double-byte character code points. This is usually accomplished by reserving a subset of the 256 available code points of a mixed code table for single-byte characters, with the remainder of the code points either undefined or allocated to the first byte of double-byte code points. These code points are shown in the following table.

Table 144. Mixed Character Set Code Points

Country Supported Mixed Code Page Code Points for Single-byte Characters Code Points for First Byte of Double-Byte Characters
Japan 932, 943 x00-7F, xA1-DF x81-9F, xE0-FC
Japan 942 x00-80, xA0-DF, xFD-FF x81-9F, xE0-FC
Taiwan 938 (*) x00-7E x81-FC
Taiwan 948 (*) x00-80, FD, FE x81-FC
Korea 949 x00-7F x8F-FE
Taiwan 950 x00-7E x81-FE
China 1381 x00-7F x8C-FE
Korea 1363 x00-7F x81-FE
China 1386 x00 x81-FE

Notes:

(*) means that this is an old code page and is not recommended anymore.

Country	Supported Mixed Code Page	Code Points for Single-byte Characters	Code Points for First Byte of Double-Byte Characters
Japan	932, 943	x00-7F, xA1-DF	x81-9F, xE0-FC
Japan	942	x00-80, xA0-DF, xFD-FF	x81-9F, xE0-FC
Taiwan	938 (*)	x00-7E	x81-FC
Taiwan	948 (*)	x00-80, FD, FE	x81-FC
Korea	949	x00-7F	x8F-FE
Taiwan	950	x00-7E	x81-FE
China	1381	x00-7F	x8C-FE
Korea	1363	x00-7F	x81-FE
China	1386	x00	x81-FE
Notes: (*) means that this is an old code page and is not recommended anymore.

Code points not assigned to either category above are not defined, and are processed as single-byte undefined code points.

Within each implied DBCS code table, there are 256 code points available as the second byte for each valid first byte. Second byte values can have any value from 0x40 to 0x7E and from 0x80 to 0xFE. Note that in DBCS environments, DB2 does not perform validity checking on individual double-byte characters.

Extended UNIX Code (EUC) Character Sets

Each EUC code page allows for both single-byte character code points, and up to three different sets of multi-byte character code points. This is accomplished by reserving a subset of the 256 available code points of each implied SBCS code page identifier for single-byte characters. The remainder of the code points is undefined, allocated as an element of a multi-byte character, or allocated as a single-shift introducer of a multi-byte character. These code points are shown in the following tables.

Table 145. Japanese EUC Code Points

Group 1st Byte 2nd Byte 3rd Byte 4th Byte
G0 x20-7E n/a n/a n/a
G1 xA1-FE xA1-FE n/a n/a
G2 x8E xA1-FE n/a n/a
G3 x8E xA1-FE xA1-FE n/a

Group	1st Byte	2nd Byte	3rd Byte	4th Byte
G0	x20-7E	n/a	n/a	n/a
G1	xA1-FE	xA1-FE	n/a	n/a
G2	x8E	xA1-FE	n/a	n/a
G3	x8E	xA1-FE	xA1-FE	n/a

Table 146. Traditional Chinese EUC Code Points

Group 1st Byte 2nd Byte 3rd Byte 4th Byte
G0 x20-7E n/a n/a n/a
G1 xA1-FE xA1-FE n/a n/a
G2 x8E xA1-FE xA1-FE xA1-FE
G3 n/a n/a n/a n/a

Group	1st Byte	2nd Byte	3rd Byte	4th Byte
G0	x20-7E	n/a	n/a	n/a
G1	xA1-FE	xA1-FE	n/a	n/a
G2	x8E	xA1-FE	xA1-FE	xA1-FE
G3	n/a	n/a	n/a	n/a

Table 147. Korean EUC Code Points

Group 1st Byte 2nd Byte 3rd Byte 4th Byte
G0 x20-7E n/a n/a n/a
G1 xA1-FE xA1-FE n/a n/a
G2 n/a n/a n/a n/a
G3 n/a n/a n/a n/a

Group	1st Byte	2nd Byte	3rd Byte	4th Byte
G0	x20-7E	n/a	n/a	n/a
G1	xA1-FE	xA1-FE	n/a	n/a
G2	n/a	n/a	n/a	n/a
G3	n/a	n/a	n/a	n/a

Table 148. Simplified Chinese EUC Code Points

Group 1st Byte 2nd Byte 3rd Byte 4th Byte
G0 x20-7E n/a n/a n/a
G1 xA1-FE xA1-FE n/a n/a
G2 n/a n/a n/a n/a
G3 n/a n/a n/a n/a

Group	1st Byte	2nd Byte	3rd Byte	4th Byte
G0	x20-7E	n/a	n/a	n/a
G1	xA1-FE	xA1-FE	n/a	n/a
G2	n/a	n/a	n/a	n/a
G3	n/a	n/a	n/a	n/a

Code points not assigned to the categories shown above are not defined, and are treated as single-byte undefined code points.

Character Set for Identifiers

The basic character set that may be used in database names consists of the single-byte uppercase and lowercase Latin letters (A...Z, a...z), the Arabic numerals (0...9) and the underscore character (_). This list of letters is augmented with the three special characters #, @ and $ to provide compatibility with host database products. However, these special characters should be used with care in an NLS environment because they are not included in the NLS host (EBCDIC) invariant character set.

When naming database objects (such as tables and views), program labels, host variables, cursors and statements alphabetics from the extended character set may also be used. For example, those letters with diacritical marks. The available characters depend on the code page in use and if you are using the database in a multiple code page environment, you must ensure that all code pages support any alphabetics you plan on using from the extended character set. See the SQL Reference for a discussion of delimited identifiers which can be used in SQL statements and can also contain characters outside the extended character set.

Extended Character Set Definition for DBCS Identifiers

In DBCS environments, the extended character set consists of all the characters in the basic character set, plus those identified as a letter or digit as follows:

All double-byte characters in each DBCS code page, except the double-byte space, are valid letters.
The double-byte space is a special character.
The single-byte characters available in each mixed code page are assigned to various categories as follows:

Category
Valid Code Points within each Mixed Code Page
Digits
x30-39
Letters
x23-24, x40-5A, x61-7A, xA6-DF (A6-DF for code pages 932 and 942 only)
Special Characters
All other valid single-byte character code points

Coding of SQL Statements

The coding of SQL statements is not language dependent. SQL is a programming language and, like other programming languages such as C, it is language invariant. The SQL keywords must be typed as shown, although they may be typed in uppercase, lowercase, or mixed case. The names of database objects, host variables and program labels that occur in an SQL statement cannot contain characters outside the database manager extended character set as described above.

Bidirectional CCSID Support

The following BiDi attributes are required for correct handling of Bidirectional data on different platforms:

   - Text type (LOGICAL vs VISUAL)
   - Shaping (SHAPED vs UNSHAPED)
   - Orientation (RIGHT-TO-LEFT vs LEFT-TO-RIGHT)
   - Numeral shape (ARABIC vs HINDI)
   - Symmetric swapping (YES or NO)

Defaults on different platforms are not the same, problems appear when DB2 data is sent from one platform to another. For example, Windows platforms use LOGICAL UNSHAPED data, while data on OS/390 is usually in SHAPED VISUAL format. Therefore, without any support for bidirectional attributes, data sent from DB2 Universal Database for OS/390 to DB2 UDB on a Windows 32-bit operating systems may display incorrectly.

Bidirectional-specific CCSIDs

DB2 supports bidirectional data attributes through special bidirectional Coded Character Set Identifiers (CCSIDs). The following bidirectional CCSIDs have been defined and are implemented with DB2 UDB:

        CCSID |  Code  | String
              |  Page  |  Type
       -------+--------+----------
         00420    420      4
         00424    424      4
         08612    420      5
         08616    424      6
         62208    856      4
         62209    862      4
         62210    916      4
         62211    424      5
         00856    856      5
         62213    862      5
         00916    916      5
         01255   1255      5
         01046   1046      5
         00864    864      5
         01089   1089      5
         01256   1256      5
         62220    856      6
         62221    862      6
         62222    916      6
         62223   1255      6
         62224    420      6
         62225    864      6
         62226   1046      6
         62227   1089      6
         62228   1256      6
         62235    424     10
         62236    856     10
         00862    862     10
         62238    916     10
         62239   1255     10
         62240    424     11
         62241    856     11
         62242    862     11
         62243    916     11
         62244   1255     11

Where CDRA String Types are defined:

    String |  Text | Numerical  | Orientation |  Shaping  | Symmetrical
     Type  |  Type |   Shape    |             |           |  Swapping
  ---------+-------+------------+-------------+-----------+-------------
      4     Visual     Arabic         LTR       Shaped         OFF
      5     Implicit   Arabic         LTR       Unshaped       ON
      6     Implicit   Arabic         RTL       Unshaped       ON
      7(*)  Visual     Arabic    Contextual(*)  Unshaped-Lig   OFF
      8     Visual     Arabic         RTL       Shaped         OFF
      9     Visual    Passthru        RTL       Shaped         ON
     10     Implicit             Contextual-L                  ON
     11     Implicit             Contextual-R                  ON

Note:

(*) Field orientation is left-to-right (LTR) when the first alphabetic character is a Latin one, and right-to-left (RTL) when it is a bidirectional (RTL) character. Characters are unshaped, but LamAlef ligatures are kept, and not broken into constituents.

DB2 Universal Database Implementation of Bidirectional Support

Bidirectional layout transformations are implemented in DB2 Universal Database using the new CCSID definitions. For the new BiDi-specific CCSIDs, layout transformations are performed instead of or in addition to code page conversions. To use this support, the DB2BIDI registry variable must be set to YES. By default, this variable is not set. This variable is used by the server for all conversions, and can only be set when the server is started. Setting DB2BIDI to YES may have some performance impact because of additional checking and layout transformations.

To specify a specific bidirectional CCSID in non-DRDA environment, select the appropriate CCSID from the above table that matches the characteristics of your client, and set DB2CODEPAGE to that value. If you already have a connection to the database, you must issue a TERMINATE command and connect again to make the new setting of DB2CODEPAGE take effect. If you select a CCSID which is not correct for code page or string type of your client platform, results would be unexpected. If you select an incompatible CCSID (i.e., Hebrew CCSID for connection to an Arabic database or vice-versa), or if DB2BIDI has not been set for the server, you will receive an error message when you try to connect.

For DRDA environments, if the HOST EBCDIC platform also supports these bidirectional CCSIDs, you need to only set DB2CODEPAGE as mentioned above. However, if HOST platform does not support these CCSIDs, you must specify a CCSID override for the HOST database server that you are connecting to. This is necessary because, in DRDA environment, code page conversions and layout transformations are performed by the receiver of data. However, if HOST server does not support these bidirectional CCSIDs, it does not perform layout transformation on the data that it receives from DB2 UDB. If you use a CCSID override, the DB2 UDB client performs layout transformation on the outbound data as well. For details of how to set a CCSID override, please refer to DB2 Connect Release Notes.

CCSID override is not supported for cases where the HOST EBCDIC platform is the client and DB2 UDB is the server.

DB2 Connect Implementation of Bidirectional Support

When data is exchanged between DB2 Connect and a database on a server, it is usually the receiver that performs conversion on the incoming data. The same convention would normally apply to bidirectional layout transformations also, which is in addition to the usual code page conversion. DB2 Connect has the optional ability to perform bidirectional layout transformation on data it is about to send to the server database in addition to data received from the server database.

In order for DB2 Connect to perform bidirectional layout transformation on outgoing data for a server database, the bidirectional Coded Character Set Identifier (CCSID) of the server database will have to be overridden. This is accomplished through the use of the BIDI parameter in the PARMS field of the DCS database directory entry for the server database.
Note: If you would like DB2 Connect to perform layout transformation on the data it is about to send to the DB2 host database, even though you do not have to override its CCSID, you still have to add the BIDI parameter in the DCS database directory PARMS field. In this case, the CCSID that you should provide is the default DB2 host database CCSID.

The BIDI parameter is to be specified as the ninth parameter in the PARMS field along with the bidirectional CCSID with which the user would like to override the default server database bidirectional CCSID in the following format:

   ",,,,,,,,BIDI=xyz"

where xyz is the CCSID override.
Note: The registry variable, DB2BIDI, must be set to "YES" in order for the BIDI parameter to take effect.

A list of the bidirectional CCSIDs that are supported along with their string types is found in Bidirectional-specific CCSIDs.

The use of this feature is best illustrated with an example.

Suppose you have a Hebrew DB2 client running CCSID 62213 (bidirectional string type 5) and you would like to access a DB2 host database running CCSID 00424 (bidirectional string type 4). However, you know that the data contained in the DB2 host database is instead based on CCSID 08616 (bidirectional string type 6).

There are two problems in this situation: The first is that the DB2 host database does not know the difference in the bidirectional string types with CCSIDs 00424 and 08616. The second problem is that the DB2 host database does not recognize the DB2 client CCSID of 62213. It only supports CCSID 00862, which is based on the same code page as CCSID 62213.

You will need to make sure that data sent to the DB2 host database is in bidirectional string type 6 format to begin with and also let DB2 Connect know that it has to perform bidirectional transformation on data it receives from the DB2 host database. You will need to use following catalog command for the DB2 host database:

   db2 catalog dcs database nydb1 as telaviv parms ",,,,,,,,BIDI=08616"

What this command does is tell DB2 Connect to override the DB2 host database CCSID of 00424 with 08616. This override includes the following processing:

DB2 Connect connects to the DB2 host database using CCSID 00862.
DB2 Connect performs bidirectional layout transformation on the data it is about to send to the DB2 host database. The transformation is from CCSID 62213 (bidirectional string type 5) to CCSID 62221 (bidirectional string type 6).
DB2 Connect performs bidirectional layout transformation on data it receives from the DB2 host database. The transformation is from CCSID 08616 (bidirectional string type 6) to CCSID 62213 (bidirectional string type 5).

Note:

In some cases, use of a bidirectional CCSID may cause the SQL query itself to be modified such that it is not recognized by the DB2 server. Specifically, you should avoid using IMPLICIT CONTEXTUAL and IMPLICIT RIGHT-TO-LEFT CCSIDs when a different string type can be used. CONTEXTUAL CCSIDs can produce unpredictable results if the SQL query contains quoted strings. Avoid using quoted strings in SQL statements and use host variables instead whenever possible.

If a specific bidirectional CCSID is causing problems which cannot be rectified by following these recommendations, then you should set DB2BIDI=NO.

Collating Sequences

The database manager compares character data using a collating sequence. This is an ordering for a set of characters that determines whether a particular character sorts higher, lower, or the same as another.
Note: Character string data defined with the FOR BIT DATA attribute, or BLOB data, is sorted using the binary sort sequence.

For example, a collating sequence can be used to indicate that lowercase and uppercase versions of a particular character are to be sorted equally.

The database manager allows databases to be created with custom collating sequences. The following sections help you determine and implement a particular collating sequence for a database.

Overview

In a database, each single-byte character is represented internally as a unique number between 0 and 255, (in hexadecimal notation, between X'00' and X'FF'). This number is referred to as the code point of the character; the assignments of numbers to characters in a set are collectively called a code page. A collating sequence is a mapping between the code point and the desired position of each character in a sorted sequence. The numeric value of the position is called the weight of the character in the collating sequence. The simplest collating sequence is one where the weights are identical to the code points. This is called the identity sequence.

For example, consider the characters B (X'42'), and b (X'62'). If, according to the collating sequence table, they both have a sort weight of X'42' (B), then they collate the same. If the sort weight for B is X'9E' and the sort weight for b is X'9D', then b will be sorted before B. Actual weights depend on the collating sequence table used which depends on the code set and locale. Note that a collating sequence table is not the same as a code page table, which defines code points.

Consider the following example. In ASCII, the characters A through Z are represented by X'41' through X'5A'. To describe a collating sequence where these are sorted in order, and consecutively (no intervening characters), you can write X'41', X'42', &ellipsis; X'59', X'5A'.

For multi-byte characters, the hexadecimal value of the multi-byte character is also used as the weight. For example, X'8260', X'8261' are the code points for double byte character A and B. In this case, the collation weights for X'82', X'60', and X'61' are used to sort these two characters according to their code points.

The values of the weights in a collating sequence need not be unique. For example, you could give uppercase letters and their lowercase equivalents the same weight.

Specifying the collating sequence can be simplified if a collating sequence provides weights for all 256 code points. The weight of each character can be determined using the code point of the character. This is the method used to specify a collating sequence for the database manager: a string of 256 bytes, where the nth byte (starting with 0) contains the weight of code point n.

In all cases, DB2 uses the collation table which was specified at database creation time. If you require the multi-byte characters to sort the way they appear in their code point table, you must specify IDENTITY as your collation sequence when you create the database.
Note: For DBCS characters in GRAPHIC fields, the sort sequence is always IDENTITY without regard to the collation sequence specified at database creation time.

Character Comparisons:

Once a collating sequence is established, character comparison is performed by comparing the weights of two characters, instead of directly comparing their code point values.

If weights that are not unique are used, characters that are not identical may compare equally. Because of this, string comparison must be a two-phase process:

Compare the characters of each string based on their weights.
If step 1 yielded equality, compare the characters of each string based on their code point values.

If the collating sequence contains 256 unique weights, only the first step is performed. If the collating sequence is the identity sequence only the second step is performed. In either case, there is a performance benefit.

For more information on character comparisons, see the SQL Reference.

Case Independent Comparisons:

To perform character comparisons that are independent of whether they are upper or lower case, you can use the TRANSLATE function to select and compare mixed case column data by translating it to upper case, but only for the purposes of comparison. Consider the following data:

  Abel
  abels
  ABEL
  abel
  ab
  Ab

For the following select statement:

SELECT c1 FROM T1 WHERE TRANSLATE(c1) LIKE 'AB%'

you would receive the following results:

  ab
  Ab
  abel
  Abel
  ABEL
  abels

Note:

You could also set the select as in the following view v1, and then make all your comparisons against the view (in upper case) and your inserts into the table in mixed case:

  CREATE VIEW v1 AS SELECT TRANSLATE(c1) FROM t1

At the database level, you can set the collating sequence as part of the CREATE DATABASE API. This allows you to decide if 'a' is processed before 'A', or if 'A' is processed after 'a', or if they are processed with equal weighting. This will make them equal when collating or sorting using the ORDER BY clause. If you have two values of 'a' and 'A', 'A' will always come before 'a', because in all senses they are equal, so the only difference upon which to sort is the hexadecimal value.

Thus if you issue SELECT c1 FROM t1 WHERE c1 LIKE 'ab%', you receive the following output:

  ab
  abel
  abels

If you issue SELECT c1 FROM t1 WHERE c1 LIKE 'A%', you receive the following output:

  Abel
  Ab
  ABEL

If you issue SELECT c1 FROM t1 ORDER BY c1, you receive the following:

  ab
  Ab
  abel
  Abel
  ABEL
  abels

Thus, you may want to consider using the scalar function TRANSLATE(), as well as the CREATE DATABASE API. Note that you can only specify a collating sequence using the CREATE DATABASE API. You cannot specify a collating sequence from the Command Line Processor. For information on the TRANSLATE() function, see the SQL Reference. For information on the CREATE DATABASE API see the Administrative API Reference.

You can also use the UCASE function as follows, but note that DB2 performs a table scan instead of using an index for the select:

    SELECT * FROM EMP WHERE UCASE(JOB) = 'NURSE'

Collating Sequence Sort Order: EBCDIC and ASCII Example

The order in which data in a database is sorted depends on the collating sequence defined for the database. For example, suppose that database A uses the EBCDIC code page's default collating sequence and that database B uses the ASCII code page's default collating sequence. Sort orders at these two databases would differ, as shown in Figure 107.

Figure 107. Example of How a Sort Order in an EBCDIC-Based Sequence Differs from a Sort Order in an ASCII-Based Sequence

SELECT..... ORDER BY COL2 EBCDIC-Based Sort ASCII-Based Sort COL2 COL2 ---- ---- V1G 7AB Y2W V1G 7AB Y2W

Similarly, character comparisons in a database depend on the collating sequence defined for that database. So if database A uses the EBCDIC code page's default collating sequence and database B uses the ASCII code page's default collating sequence, the results of character comparisons at the two databases would differ. Figure 108 illustrates the difference.

Figure 108. Example of How a Comparison of Characters in an EBCDIC-Based Sequence Differs from a Comparison of Characters in an ASCII-Based Sequence

SELECT..... WHERE COL2 > 'TT3' EBCDIC-Based Results ASCII-Based Results COL2 COL2 ---- ---- TW4 TW4 X72 X72 39G

If you are creating a federated database, consider specifying that your collating sequence matches the collating sequence at a data source. This approach will maximize "pushdown" opportunities and possibly increase query performance. See the Administration Guide, Performance for more information on the relationship between pushdown analysis, collating sequences, and query performance.

Specifying a Collating Sequence

The collating sequence for a database is specified at database creation time. Once the database has been created, the collating sequence cannot be changed.

The CREATE DATABASE API accepts a data structure called the Database Descriptor Block (SQLEDBDESC). You can define your own collating sequence within this structure.

To specify a collating sequence for a database:

Pass the desired SQLEDBDESC structure, or
Pass a NULL pointer. The collating sequence of the operating system (based on current country code and code page) is used. This is the same as specifying SQLDBCSS equal to SQL_CS_SYSTEM (0).

The SQLEDBDESC structure contains:

SQLDBCSS

A 4-byte integer indicating the source of the database collating sequence. Valid values are:

SQL_CS_SYSTEM: The collating sequence of the operating system (based on current country code and code page) is used.
SQL_CS_USER: The collating sequence is specified by the value in the SQLDBUDC field.
SQL_CS_NONE: The collating sequence is the identity sequence. Strings are compared byte for byte, starting with the first byte, using a simple code point comparison.

Note:

These constants are defined in the SQLENV include file.

SQLDBUDC

A 256-byte field. The nth byte contains the sort weight of the nth character in the code page of the database. If SQLDBCSS is not equal to SQL_CS_USER, this field is ignored.

Sample Collating Sequences:

Several sample collating sequences are provided (as include files) to facilitate database creation using the EBCDIC collating sequences instead of the default workstation collating sequence.

The collating sequences in these include files can be specified in the SQLDBUDC field of the SQLEDBDESC structure. They can also be used as models for the construction of other collating sequences.

General Concerns:

Once a collating sequence is defined, all future character comparisons for that database will be performed with that collating sequence. Except for character data defined as FOR BIT DATA or BLOB data, the collating sequence will be used for all SQL comparisons and ORDER BY clauses, and also in setting up indexes and statistics. For more information on how the database collating sequence is used, see the section on String Comparisons in the SQL Reference, N/A.

Potential problems may occur in the following cases:

An application merges sorted data from a database with application data that was sorted using a different collating sequence.
An application merges sorted data from one database with sorted data from another, but the databases have different collating sequences.
An application makes assumptions about sorted data that are not true for the relevant collating sequence. For example, numbers collating lower than alphabetics might or might not be true for a particular collating sequence.

A final point to remember is that the results of any sort based on a direct comparison of code points of characters will only match the results of a query ordered using an identity collating sequence.

Federated Database Concerns:

Your choice of database collating sequence can affect federated system performance. If a data source uses the same collating sequence as the DB2 federated database, DB2 can pushdown order-dependent processing involving character data to the data source. If a data source collating sequence does not match DB2's, data is retrieved and all order-dependent processing on character data is done locally (which can slow performance).

To determine if a data source and DB2 have the same collating sequence, consider the following factors:

National language support
The collating sequence is related to the language supported on a server. Compare DB2 NLS information to data source NLS information.
Data source characteristics
Some data sources are created using case-insensitive collating sequences, which can yield different results from DB2 in order-dependent operations.
Customization
Some data sources provide multiple options for collating sequences or allow the collating sequence to be customized.

Choose the collating sequence for a DB2 federated database based on the mix of data sources that will be accessed from that database. For example:

If a DB2 database will access mostly Oracle databases with the same code page (NLS) as DB2, specify SQL_CS_NONE for the SQLDBCSS structure (Oracle databases use an equivalent collating sequence).
If a DB2 database will access only DB2 UDB databases, ensure that you match SQLDBCSS values.

For information on how to set up a MVS collating sequence, you should refer to theAdministrative API Reference for samples under the "sqlecrea" topic. These samples contain collation tables for EBCIDIC 500, 37, and 5026/5035 code pages.

After you set the collating sequence for the DB2 database, ensure that you set the collating_sequence server option for each data source server. The collating_sequence option indicates if the collating sequence of a given data source server matches the collating sequence of the DB2 database.

Set the collating_sequence option to "Y" if the collating sequences match. This setting allows the DB2 optimizer to consider order-dependent processing at a data source, which can improve performance. However, if the data source collating sequence is not the same as the DB2 database collating sequence, you can receive incorrect results. For example, if your plan uses merge joins, the DB2 optimizer will pushdown ordering operations to the data sources as much as possible. If the data source collating sequence is not the same, the join results may not have a correct result set.

Set the collating_sequence option to "N" if the collating sequences do not match. Use this value when data source collating sequences differ from DB2 or when the data source collating operations might be case insensitive. For example, in a case insensitive data source with an English code page, TOLLESON, ToLLeSoN, and tolleson would all be considered equal. Set the collating_sequence option to "N" if you are not sure that the collating sequence at the data source is identical to the DB2 collating sequence.

Datetime Values

The datetime data types are described below. Although datetime values can be used in certain arithmetic and string operations and are compatible with certain strings, they are neither strings nor numbers.

Date

A date is a three-part value (year, month, and day). The range of the year part is 0001 to 9999. The range of the month part is 1 to 12. The range of the day part is 1 to x, where x depends on the month.

The internal representation of a date is a string of 4 bytes. Each byte consists of 2 packed decimal digits. The first 2 bytes represent the year, the third byte the month, and the last byte the day.

The length of a DATE column, as described in the SQLDA, is 10 bytes, which is the appropriate length for a character string representation of the value.

Time

A time is a three-part value (hour, minute, and second) designating a time of day under a 24-hour clock. The range of the hour part is 0 to 24; while the range of the other parts is 0 to 59. If the hour is 24, the minute and second specifications will be zero.

The internal representation of a time is a string of 3 bytes. Each byte is 2 packed decimal digits. The first byte represents the hour, the second byte the minute, and the last byte the second.

The length of a TIME column, as described in the SQLDA, is 8 bytes, which is the appropriate length for a character string representation of the value.

Timestamp

A timestamp is a seven-part value (year, month, day, hour, minute, second, and microsecond) that designates a datetime as defined above, except that the time includes a fractional specification of microseconds.

The internal representation of a timestamp is a string of 10 bytes, each of which consists of 2 packed decimal digits. The first 4 bytes represent the date, the next 3 bytes the time, and the last 3 bytes the microseconds.

The length of a TIMESTAMP column, as described in the SQLDA, is 26 bytes, which is the appropriate length for the character string representation of the value.

String Representations of Datetime Values

Values whose data types are DATE, TIME, or TIMESTAMP are represented in an internal form that is transparent to the SQL user. Dates, times, and timestamps can, however, also be represented by character strings, and these representations directly concern the SQL user since there are no constants or variables whose data types are DATE, TIME, or TIMESTAMP. Thus, to be retrieved, a datetime value must be assigned to a character string variable. The character string representation is normally the default format of datetime values associated with the country code of the database, unless overridden by specification of the F format option when the program is precompiled or bound to the database. See Table 151 for a listing of the string formats for the various country codes.

When a valid string representation of a datetime value is used in an operation with an internal datetime value, the string representation is converted to the internal form of the date, time, or timestamp before the operation is performed. The following sections define the valid string representations of datetime values.

Date Strings

A string representation of a date is a character string that starts with a digit and has a length of at least 8 characters. Trailing blanks may be included; leading zeros may be omitted from the month and day portions.

Valid string formats for dates are listed in Table 1. Each format is identified by name and includes an associated abbreviation and an example of its use.

Table 149. Formats for String Representations of Dates

Format Name Abbreviation Date Format Example
International Standards Organization ISO yyyy-mm-dd 1991-10-27
IBM USA standard USA mm/dd/yyyy 10/27/1991
IBM European standard EUR dd.mm.yyyy 27.10.1991
Japanese Industrial Standard Christian era JIS yyyy-mm-dd 1991-10-27
Site-defined (Local) LOC Depends on database country code --

Format Name	Abbreviation	Date Format	Example
International Standards Organization	ISO	yyyy-mm-dd	1991-10-27
IBM USA standard	USA	mm/dd/yyyy	10/27/1991
IBM European standard	EUR	dd.mm.yyyy	27.10.1991
Japanese Industrial Standard Christian era	JIS	yyyy-mm-dd	1991-10-27
Site-defined (Local)	LOC	Depends on database country code	--

Time Strings

A string representation of a time is a character string that starts with a digit and has a length of at least 4 characters. Trailing blanks may be included; a leading zero may be omitted from the hour part of the time and seconds may be omitted entirely. If you choose to omit seconds, an implicit specification of 0 seconds is assumed. Thus, 13.30 is equivalent to 13.30.00.

Valid string formats for times are listed in Table 150. Each format is identified by name and includes an associated abbreviation and an example of its use.

Table 150. Formats for String Representations of Times

Format Name Abbreviation Time Format Example
International Standards Organization ISO hh.mm.ss 13.30.05
IBM USA standard USA hh:mm AM or PM 1:30 PM
IBM European standard EUR hh.mm.ss 13.30.05
Japanese Industrial Standard Christian Era JIS hh:mm:ss 13:30:05
Site-defined (Local) LOC Depends on application country code --

Format Name	Abbreviation	Time Format	Example
International Standards Organization	ISO	hh.mm.ss	13.30.05
IBM USA standard	USA	hh:mm AM or PM	1:30 PM
IBM European standard	EUR	hh.mm.ss	13.30.05
Japanese Industrial Standard Christian Era	JIS	hh:mm:ss	13:30:05
Site-defined (Local)	LOC	Depends on application country code	--

Notes:

In ISO, EUR and JIS format, .ss (or :ss) is optional.
In the case of the USA time string format, the minutes specification may be omitted, indicating an implicit specification of 00 minutes. Thus 1 PM is equivalent to 1:00 PM.
In the USA time format, the hour must not be greater than 12 and cannot be 0 except for the special case of 00:00 AM. Using the ISO format of the 24-hour clock, the correspondence between the USA format and the 24-hour clock is as follows:

12:01 AM through 12:59 AM corresponds to 00.01.00 through 00.59.00.
01:00 AM through 11:59 AM corresponds to 01.00.00 through 11.59.00.
12:00 PM (noon) through 11:59 PM corresponds to 12.00.00 through 23.59.00.
12:00 AM (midnight) corresponds to 24.00.00 and 00:00 AM (midnight) corresponds to 00.00.00.

Timestamp Strings

A string representation of a timestamp is a character string that starts with a digit and has a length of at least 16 characters. The complete string representation of a timestamp has the form yyyy-mm-dd-hh.mm.ss.nnnnnn. Trailing blanks may be included. Leading zeros may be omitted from the month, day, and hour part of the timestamp, and microseconds may be truncated or entirely omitted. If you choose to omit any digit of the microseconds portion, an implicit specification of 0 is assumed. Thus, 1991-3-2-8.30.00 is equivalent to 1991-03-02-08.30.00.000000.

MBCS Considerations

Date and timestamp strings must contain only single-byte characters and digits.

Date and Time Formats:

The character string representation of date and time formats is the default format of datetime values associated with the country code of the application. This default format may be overridden by specification of the F format option when the program is precompiled or bound to the database.

The following is a description of the input and output formats for date and time:

Input Time Format
- There is no default input time format
- All time formats are allowed as input for all country codes.
Output Time Format
- The default output time format is equal to the local time format.
Input Date Format
- There is no default input date format
- Where the local format for date conflicts with an ISO, JIS, EUR, or USA date format, the local format is recognized for date input. For example, see the UK entry in Table 151.
Output Date Format
- The default output date format is shown in Table 151.
  Note: Table 151 also shows a listing of the string formats for the various country codes.

Table 151. Date and Time Formats by Country Code

Country Code Local Date Format Local Time Format Default Output Date Format Input Date Formats
785 Arabic dd/mm/yyyy JIS LOC LOC, EUR, ISO
001 Australia (1) mm-dd-yyyy JIS LOC LOC, USA, EUR, ISO
061 Australia dd-mm-yyyy JIS LOC LOC, USA, EUR, ISO
032 Belgium dd/mm/yyyy JIS LOC LOC, EUR, ISO
055 Brazil dd.mm.yy JIS LOC LOC, USA, EUR, ISO
359 Bulgaria dd.mm.yyyy JIS EUR LOC, USA, EUR, ISO
001 Canada mm-dd-yyyy JIS USA LOC, USA, EUR, ISO
002 Canada (French) dd-mm-yyyy ISO ISO LOC, USA, EUR, ISO
385 Croatia yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
042 Czech Republic yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
045 Denmark dd-mm-yyyy ISO ISO LOC, USA, EUR, ISO
358 Finland dd/mm/yyyy ISO EUR LOC, EUR, ISO
389 FYR Macedonia dd.mm.yyyy JIS EUR LOC, USA, EUR, ISO
033 France dd/mm/yyyy JIS EUR LOC, EUR, ISO
049 Germany dd/mm/yyyy ISO ISO LOC, EUR, ISO
030 Greece dd/mm/yyyy JIS LOC LOC, EUR, ISO
036 Hungary yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
354 Iceland dd-mm-yyyy JIS LOC LOC, USA, EUR, ISO
972 Israel dd/mm/yyyy JIS LOC LOC, EUR, ISO
039 Italy dd/mm/yyyy JIS LOC LOC, EUR, ISO
081 Japan mm/dd/yyyy JIS ISO LOC, USA, EUR, ISO
082 Korea mm/dd/yyyy JIS ISO LOC, USA, EUR, ISO
001 Latin America (1) mm-dd-yyyy JIS LOC LOC, USA, EUR, ISO
003 Latin America dd-mm-yyyy JIS LOC LOC, EUR, ISO
031 Netherlands dd-mm-yyyy JIS LOC LOC, USA, EUR, ISO
047 Norway dd/mm/yyyy ISO EUR LOC, EUR, ISO
048 Poland yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
351 Portugal dd/mm/yyyy JIS LOC LOC, EUR, ISO
086 PRC mm/dd/yyyy JIS ISO LOC, USA, EUR, ISO
040 Romania yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
007 Russia dd/mm/yyyy ISO LOC LOC, EUR, ISO
381 Serbia/Montenegro yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
042 Slovakia yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
386 Slovenia yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
034 Spain dd/mm/yyyy JIS LOC LOC, EUR, ISO
046 Sweden dd/mm/yyyy ISO ISO LOC, EUR, ISO
041 Switzerland dd/mm/yyyy ISO EUR LOC, EUR, ISO
088 Taiwan mm-dd-yyyy JIS ISO LOC, USA, EUR, ISO
066 Thailand (2) dd/mm/yyyy JIS LOC LOC, EUR, ISO
090 Turkey dd/mm/yyyy JIS LOC LOC, EUR, ISO
044 UK dd/mm/yyyy JIS LOC LOC, EUR, ISO
001 USA mm-dd-yyyy JIS USA LOC, USA, EUR, ISO

Notes:

Countries using the default C locale are assigned country code 001.
yyyy is in Buddhist era: Gregorian + 543 years.

Country Code	Local Date Format	Local Time Format	Default Output Date Format	Input Date Formats
785 Arabic	dd/mm/yyyy	JIS	LOC	LOC, EUR, ISO
001 Australia (1)	mm-dd-yyyy	JIS	LOC	LOC, USA, EUR, ISO
061 Australia	dd-mm-yyyy	JIS	LOC	LOC, USA, EUR, ISO
032 Belgium	dd/mm/yyyy	JIS	LOC	LOC, EUR, ISO
055 Brazil	dd.mm.yy	JIS	LOC	LOC, USA, EUR, ISO
359 Bulgaria	dd.mm.yyyy	JIS	EUR	LOC, USA, EUR, ISO
001 Canada	mm-dd-yyyy	JIS	USA	LOC, USA, EUR, ISO
002 Canada (French)	dd-mm-yyyy	ISO	ISO	LOC, USA, EUR, ISO
385 Croatia	yyyy-mm-dd	JIS	ISO	LOC, USA, EUR, ISO
042 Czech Republic	yyyy-mm-dd	JIS	ISO	LOC, USA, EUR, ISO
045 Denmark	dd-mm-yyyy	ISO	ISO	LOC, USA, EUR, ISO
358 Finland	dd/mm/yyyy	ISO	EUR	LOC, EUR, ISO
389 FYR Macedonia	dd.mm.yyyy	JIS	EUR	LOC, USA, EUR, ISO
033 France	dd/mm/yyyy	JIS	EUR	LOC, EUR, ISO
049 Germany	dd/mm/yyyy	ISO	ISO	LOC, EUR, ISO
030 Greece	dd/mm/yyyy	JIS	LOC	LOC, EUR, ISO
036 Hungary	yyyy-mm-dd	JIS	ISO	LOC, USA, EUR, ISO
354 Iceland	dd-mm-yyyy	JIS	LOC	LOC, USA, EUR, ISO
972 Israel	dd/mm/yyyy	JIS	LOC	LOC, EUR, ISO
039 Italy	dd/mm/yyyy	JIS	LOC	LOC, EUR, ISO
081 Japan	mm/dd/yyyy	JIS	ISO	LOC, USA, EUR, ISO
082 Korea	mm/dd/yyyy	JIS	ISO	LOC, USA, EUR, ISO
001 Latin America (1)	mm-dd-yyyy	JIS	LOC	LOC, USA, EUR, ISO
003 Latin America	dd-mm-yyyy	JIS	LOC	LOC, EUR, ISO
031 Netherlands	dd-mm-yyyy	JIS	LOC	LOC, USA, EUR, ISO
047 Norway	dd/mm/yyyy	ISO	EUR	LOC, EUR, ISO
048 Poland	yyyy-mm-dd	JIS	ISO	LOC, USA, EUR, ISO
351 Portugal	dd/mm/yyyy	JIS	LOC	LOC, EUR, ISO
086 PRC	mm/dd/yyyy	JIS	ISO	LOC, USA, EUR, ISO
040 Romania	yyyy-mm-dd	JIS	ISO	LOC, USA, EUR, ISO
007 Russia	dd/mm/yyyy	ISO	LOC	LOC, EUR, ISO
381 Serbia/Montenegro	yyyy-mm-dd	JIS	ISO	LOC, USA, EUR, ISO
042 Slovakia	yyyy-mm-dd	JIS	ISO	LOC, USA, EUR, ISO
386 Slovenia	yyyy-mm-dd	JIS	ISO	LOC, USA, EUR, ISO
034 Spain	dd/mm/yyyy	JIS	LOC	LOC, EUR, ISO
046 Sweden	dd/mm/yyyy	ISO	ISO	LOC, EUR, ISO
041 Switzerland	dd/mm/yyyy	ISO	EUR	LOC, EUR, ISO
088 Taiwan	mm-dd-yyyy	JIS	ISO	LOC, USA, EUR, ISO
066 Thailand (2)	dd/mm/yyyy	JIS	LOC	LOC, EUR, ISO
090 Turkey	dd/mm/yyyy	JIS	LOC	LOC, EUR, ISO
044 UK	dd/mm/yyyy	JIS	LOC	LOC, EUR, ISO
001 USA	mm-dd-yyyy	JIS	USA	LOC, USA, EUR, ISO
Notes: Countries using the default C locale are assigned country code 001. yyyy is in Buddhist era: Gregorian + 543 years.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]