Administration Guide

Character Sets

The database manager does not, in general, restrict the character set available to an application. For a detailed explanation of multi-byte character sets (MBCS) supported by DB2, refer to the Application Development Guide.

Character Set for Identifiers

The basic character set that can be used in database names consists of the single-byte uppercase and lowercase Latin letters (A...Z, a...z), the Arabic numerals (0...9) and the underscore character (_). This list is augmented with three special characters (#, @, and $) to provide compatibility with host database products. However, these special characters should be used with care in an NLS environment, because they are not included in the NLS host (EBCDIC) invariant character set.

When naming database objects (such as tables and views), program labels, host variables, cursors, and elements from the extended character set (for example, letters with diacritical marks) can also be used. Precisely which characters are available depends on the code page in use. If you are using the database in a multiple code page environment, you must ensure that all code pages support any elements from the extended character set that you plan to use. For information about delimited identifiers that have characters outside of the extended character set, but which can be used in SQL statements, refer to the SQL Reference.

Extended Character Set Definition for DBCS Identifiers

In DBCS environments, the extended character set consists of all the characters in the basic character set, plus the following:

Coding SQL Statements

The coding of SQL statements is not language dependent. SQL keywords can be typed in uppercase, lowercase, or mixed case. The names of database objects and host variables, as well as program labels in an SQL statement cannot contain characters that are outside of the extended character set as described above.

Bidirectional CCSID Support

The following BiDi attributes are required for correct handling of bidirectional data on different platforms:

   - Text type (LOGICAL or VISUAL)
   - Shaping (SHAPED or UNSHAPED)
   - Orientation (RIGHT-TO-LEFT or LEFT-TO-RIGHT)
   - Numeral shape (ARABIC or HINDI)
   - Symmetric swapping (YES or NO)

Because default values on different platforms are not the same, problems can occur when DB2 data is moved from one platform to another. For example, the Windows operating system uses LOGICAL UNSHAPED data, while OS/390 usually uses SHAPED VISUAL data. Therefore, without support for bidirectional attributes, data sent from DB2 Universal Database for OS/390 to DB2 UDB on Windows 32-bit operating systems may display incorrectly.

Bidirectional-specific CCSIDs

DB2 supports bidirectional data attributes through special bidirectional Coded Character Set Identifiers (CCSIDs). The following bidirectional CCSIDs have been defined and are implemented with DB2 UDB:

          CCSID |  CCSID |  Code  | String
          (dec) |  (hex) |  Page  |  Type
         -------+--------+--------+--------
           00420   x'01A4'   420      4
           00424   x'01A8'   424      4
           08612   x'21A4'   420      5
           08616   x'21A8'   424     10
 
           00856   x'0358'   856      5
           00862   x'035E'   862      4
           00864   x'0360'   864      5
           00916   x'0394'   916      5
           01046   x'0416'  1046      5
           01089   x'0441'  1089      5
           01255   x'04E7'  1255      5
           01256   x'04E8'  1256      5
 
           62208   x'F300'   856      4
           62209   x'F301'   862     10
           62210   x'F302'   916      4
           62211   x'F303'   424      5
           62213   x'F305'   862      5
           62215   x'F307'  1255      4
           62218   x'F30A'   864      4
           62220   x'F30C'   856      6
           62221   x'F30D'   862      6
           62222   x'F30E'   916      6
           62223   x'F30F'  1255      6
           62224   x'F310'   420      6
           62225   x'F311'   864      6
           62226   x'F312'  1046      6
           62227   x'F313'  1089      6
           62228   x'F314'  1256      6
           62229   x'F315'   424      8
           62230   x'F316'   856      8
           62231   x'F317'   862      8
           62232   x'F318'   916      8
           62233   x'F319'   420      8
           62234   x'F31A'   420      9
           62235   x'F31B'   424      6
           62236   x'F31C'   856     10
           62237   x'F31D'  1255      8
           62238   x'F31E'   916     10
           62239   x'F31F'  1255     10
           62240   x'F320'   424     11
           62241   x'F321'   856     11
           62242   x'F322'   862     11
           62243   x'F323'   916     11
           62244   x'F324'  1255     11
 
           62245   x'F325'   424     10
           62246   x'F326'  1046      8
           62247   x'F327'  1046      9
           62248   x'F328'  1046      4
           62249   x'F329'  1046     12
           62250   x'F32A'   420     12

where CDRA string types are defined as:

    String |  Text | Numerical  | Orientation |  Shaping  | Symmetrical
     Type  |  Type |   Shape    |             |           |  Swapping
   --------+-------+------------+-------------+-----------+-------------
      4     Visual    Passthru        LTR       Shaped         OFF
      5     Implicit   Arabic         LTR       Unshaped       ON
      6     Implicit   Arabic         RTL       Unshaped       ON
      7(*)  Visual     Arabic    Contextual(*)  Unshaped-Lig   OFF
      8     Visual     Arabic         RTL       Shaped         OFF
      9     Visual    Passthru        RTL       Shaped         ON
     10     Implicit  Passthru   Contextual-L   Unshaped       ON
     11     Implicit  Passthru   Contextual-R   Unshaped       ON
     12     Implicit   Arabic         RTL       Shaped         ON
Note:(*) Field orientation is left-to-right (LTR) when the first alphabetic character is a Latin character, and right-to-left (RTL) when it is a bidirectional (RTL) character. Characters are unshaped, but LamAlef ligatures are kept, and are not broken into constituents.

DB2 Universal Database Implementation of Bidirectional Support

Bidirectional layout transformations are implemented in DB2 Universal Database using the new CCSID definitions. For the new BiDi-specific CCSIDs, layout transformations are performed instead of, or in addition to, code page conversions. To use this support, the DB2BIDI registry variable must be set to YES. By default, this variable is not set. It is used by the server for all conversions, and can only be set when the server is started. Setting DB2BIDI to YES may have some performance impact because of additional checking and layout transformations.

To specify a particular bidirectional CCSID in a non-DRDA environment, select the CCSID (from the above table) that matches the characteristics of your client, and set DB2CODEPAGE to that value. If you already have a connection to the database, you must issue a TERMINATE command, and then reconnect to allow the new setting for DB2CODEPAGE to take effect. If you select a CCSID that is not appropriate for the code page or string type of your client platform, you may get unexpected results. If you select an incompatible CCSID (for example, the Hebrew CCSID for connection to an Arabic database), or if DB2BIDI has not been set for the server, you will receive an error message when you try to connect.

For DRDA environments, if the HOST EBCDIC platform also supports these bidirectional CCSIDs, you only need to set the DB2CODEPAGE value. However, if the HOST platform does not support these CCSIDs, you must also specify a CCSID override for the HOST database server to which you are connecting. This is necessary because, in a DRDA environment, code page conversions and layout transformations are performed by the receiver of data. However, if the HOST server does not support these bidirectional CCSIDs, it does not perform layout transformation on the data that it receives from DB2 UDB. If you use a CCSID override, the DB2 UDB client performs layout transformation on the outbound data as well. For information about setting a CCSID override, refer to the DB2 Connect User's Guide.

CCSID override is not supported for cases where the HOST EBCDIC platform is the client, and DB2 UDB is the server.

DB2 Connect Implementation of Bidirectional Support

When data is exchanged between DB2 Connect and a database on the server, it is usually the receiver that performs conversion on the incoming data. The same convention would normally apply to bidirectional layout transformations, and is in addition to the usual code page conversion. DB2 Connect has the optional ability to perform bidirectional layout transformation on data it is about to send to the server database, in addition to data received from the server database.

In order for DB2 Connect to perform bidirectional layout transformation on outgoing data for a server database, the bidirectional CCSID of the server database must be overridden. This is accomplished through the use of the BIDI parameter in the PARMS field of the DCS database directory entry for the server database.
Note:If you want DB2 Connect to perform layout transformation on the data it is about to send to the DB2 host database, even though you do not have to override its CCSID, you must still add the BIDI parameter to the PARMS field of the DCS database directory. In this case, the CCSID that you should provide is the default DB2 host database CCSID.

The BIDI parameter is to be specified as the ninth parameter in the PARMS field, along with the bidirectional CCSID with which you want to override the default server database bidirectional CCSID:

   ",,,,,,,,BIDI=xyz"

where xyz is the CCSID override.
Note:The registry variable DB2BIDI must be set to YES for the BIDI parameter to take effect.

A list of the bidirectional CCSIDs that are supported, along with their string types, can be found in Bidirectional-specific CCSIDs.

The use of this feature is best described with an example.

Suppose you have a Hebrew DB2 client running CCSID 62213 (bidirectional string type 5), and you want to access a DB2 host database running CCSID 00424 (bidirectional string type 4). However, you know that the data contained in the DB2 host database is based on CCSID 08616 (bidirectional string type 6).

There are two problems here: The first is that the DB2 host database does not know the difference in the bidirectional string types with CCSIDs 00424 and 08616. The second problem is that the DB2 host database does not recognize the DB2 client CCSID (62213). It only supports CCSID 00862, which is based on the same code page as CCSID 62213.

You will need to ensure that data sent to the DB2 host database is in bidirectional string type 6 format to begin with, and also let DB2 Connect know that it has to perform bidirectional transformation on data it receives from the DB2 host database. You will need to use following catalog command for the DB2 host database:

   db2 catalog dcs database nydb1 as telaviv parms ",,,,,,,,BIDI=08616"

This command tells DB2 Connect to override the DB2 host database CCSID of 00424 with 08616. This override includes the following processing:

  1. DB2 Connect connects to the DB2 host database using CCSID 00862.
  2. DB2 Connect performs bidirectional layout transformation on the data it is about to send to the DB2 host database. The transformation is from CCSID 62213 (bidirectional string type 5) to CCSID 62221 (bidirectional string type 6).
  3. DB2 Connect performs bidirectional layout transformation on data it receives from the DB2 host database. This transformation is from CCSID 08616 (bidirectional string type 6) to CCSID 62213 (bidirectional string type 5).
Note:In some cases, use of a bidirectional CCSID may cause the SQL query itself to be modified in such a way that it is not recognized by the DB2 server. Specifically, you should avoid using IMPLICIT CONTEXTUAL and IMPLICIT RIGHT-TO-LEFT CCSIDs when a different string type can be used. CONTEXTUAL CCSIDs can produce unpredictable results if the SQL query contains quoted strings. Avoid using quoted strings in SQL statements; use host variables whenever possible.

If a specific bidirectional CCSID is causing problems that cannot be rectified by following these recommendations, set DB2BIDI to NO.

Collating Sequences

The database manager compares character data using a collating sequence. This is an ordering for a set of characters that determines whether a particular character sorts higher, lower, or the same as another. For example, a collating sequence can be used to indicate that lowercase and uppercase versions of a particular character are to be sorted equally.

The collating sequence is specified at database creation time, and cannot be modified later.

The database manager allows databases to be created with custom collating sequences, using the application programming interface (API). For information about implementing a custom collating sequence table, refer to the Application Development Guide.
Note:Character string data defined with the FOR BIT DATA attribute, and BLOB data, is sorted using the binary sort sequence.

General Concerns

Once a collating sequence is defined, all future character comparisons for that database will be performed with that collating sequence. Except for character data defined as FOR BIT DATA or BLOB data, the collating sequence will be used for all SQL comparisons and ORDER BY clauses, and also in setting up indexes and statistics. For more information about how the database collating sequence is used, see "String Comparisons" in the SQL Reference.

Potential problems can occur in the following cases:

A final point to remember is that the results of any sort based on a direct comparison of character code points will only match query results that are ordered using an identity collating sequence.

Federated Database Concerns

Your choice of database collating sequence can affect federated system performance. If a data source uses the same collating sequence as the DB2 federated database, DB2 can push down order-dependent processing involving character data to the data source. If a data source collating sequence does not match the DB2 collating sequence, data is retrieved, and all order-dependent processing on character data is done locally (this can reduce performance).

To determine whether a data source and DB2 have the same collating sequence, consider the following:

Choose the collating sequence for a DB2 federated database based on the mix of data sources that will be accessed from that database. For example:

For information about setting up an MVS collating sequence, refer to the Administrative API Reference for examples under the description of the sqlecrea - Create Database API. These examples contain collation tables for the EBCIDIC 500, 37, and 5026/5035 code pages.

After you set the collating sequence for the DB2 database, ensure that you set the collating_sequence server option for each data source server. This option specifies whether the collating sequence of a given data source server matches the collating sequence of the DB2 database.

Set the collating_sequence option to "Y" if the collating sequences match. This setting allows the DB2 optimizer to consider order-dependent processing at a data source, which can improve performance. However, if the data source collating sequence is not the same as the DB2 database collating sequence, you may receive incorrect results. For example, if your plan uses merge joins, the DB2 optimizer will push down ordering operations to the data sources as much as possible. If the data source collating sequence is not the same, the join result set may not be correct.

Set the collating_sequence option to "N" if the collating sequences do not match. Use this value when data source collating sequences differ from DB2, or when the data source collating operations might be case insensitive. For example, in a case-insensitive data source with an English code page, TOLLESON, ToLLeSoN, and tolleson would all be considered equal. Set the collating_sequence option to "N" if you are not sure whether the collating sequence at the data source is identical to the DB2 collating sequence.

Datetime Values

The datetime data types are described below. Although datetime values can be used in certain arithmetic and string operations, and are compatible with certain strings, they are neither strings nor numbers.

Date

A date is a three-part value (year, month, and day). The range of the year part is 0001 to 9999. The range of the month part is 1 to 12. The range of the day part is 1 to x, where x depends on the month.

The internal representation of a date is a string of 4 bytes. Each byte consists of 2 packed decimal digits. The first 2 bytes represent the year, the third byte the month, and the last byte the day.

The length of a DATE column, as described in the SQLDA, is 10 bytes, which is the appropriate length for a character string representation of the value.

Time

A time is a three-part value (hour, minute, and second) designating a time of day under a 24-hour clock. The range of the hour part is 0 to 24. The range of the other parts is 0 to 59. If the hour is 24, the minute and second specifications are zero.

The internal representation of a time is a string of 3 bytes. Each byte is 2 packed decimal digits. The first byte represents the hour, the second byte the minute, and the last byte the second.

The length of a TIME column, as described in the SQLDA, is 8 bytes, which is the appropriate length for a character string representation of the value.

Time Stamp

A time stamp is a seven-part value (year, month, day, hour, minute, second, and microsecond) designating a date and a time of day as defined above, except that the time includes the specification of microseconds.

The internal representation of a time stamp is a string of 10 bytes. Each byte is 2 packed decimal digits. The first 4 bytes represent the date, the next 3 bytes the time, and the last 3 bytes the microseconds.

The length of a TIMESTAMP column, as described in the SQLDA, is 26 bytes, which is the appropriate length for a character string representation of the value.

String Representations of Datetime Values

Values whose data types are DATE, TIME, or TIMESTAMP are represented in an internal form that is transparent to the SQL user. Dates, times, and time stamps can also, however, be represented by character strings, and these representations directly concern the SQL user, because there are no constants or variables whose data types are DATE, TIME, or TIMESTAMP. Thus, to be retrieved, a datetime value must be assigned to a character string variable. The character string representation is normally the default format of datetime values associated with the country code of the client, unless overridden by specification of the "F" format option when the program is precompiled or bound to the database. For a list of the string formats for the various country codes, see Table 94.

When a valid string representation of a datetime value is used in an operation with an internal datetime value, the string representation is converted to the internal form of the date, time, or time stamp before the operation is performed. Valid string representations of datetime values are defined in the following sections.

Date Strings

A string representation of a date is a character string that starts with a digit and has a length of at least 8 characters. Trailing blanks may be included; leading zeros may be omitted from the month part and the day part of the date.

Valid string formats for dates are listed in Table 92. Each format is identified by name, and includes an associated abbreviation and an example of its use.

Table 92. Formats for String Representations of Dates
Format Name Abbreviation Date Format Example
International Standards Organization ISO yyyy-mm-dd 1991-10-27
IBM USA standard USA mm/dd/yyyy 10/27/1991
IBM European standard EUR dd.mm.yyyy 27.10.1991
Japanese Industrial Standard Christian era JIS yyyy-mm-dd 1991-10-27
Site-defined (Local) LOC Depends on database country code --

Time Strings

A string representation of a time is a character string that starts with a digit and has a length of at least 4 characters. Trailing blanks may be included; a leading zero may be omitted from the hour part of the time, and seconds may be omitted entirely. If you choose to omit seconds, an implicit specification of 0 seconds is assumed. Thus, 13.30 is equivalent to 13.30.00.

Valid string formats for times are listed in Table 93. Each format is identified by name, and includes an associated abbreviation and an example of its use.

Table 93. Formats for String Representations of Times
Format Name Abbreviation Time Format Example
International Standards Organization ISO hh.mm.ss 13.30.05
IBM USA standard USA hh:mm AM or PM 1:30 PM
IBM European standard EUR hh.mm.ss 13.30.05
Japanese Industrial Standard Christian Era JIS hh:mm:ss 13:30:05
Site-defined (Local) LOC Depends on application country code --

Notes:

  1. In ISO, EUR, or JIS time string format, .ss (or :ss) is optional.

  2. In USA time string format, the minutes specification can be omitted, indicating an implicit specification of 00 minutes. Thus, 1 PM is equivalent to 1:00 PM.

  3. In USA time string format, the hours specification cannot be greater than 12, and cannot be 0, except in the special case of 00:00 AM. Using the ISO format of the 24-hour clock, the correspondence between the USA format and the 24-hour clock is as follows:

Time Stamp Strings

A string representation of a time stamp is a character string that starts with a digit and has a length of at least 16 characters. The complete string representation of a time stamp has the form yyyy-mm-dd-hh.mm.ss.nnnnnn. Trailing blanks may be included; leading zeros may be omitted from the month, day, or hour part of the time stamp, and microseconds may be truncated or omitted entirely. If you choose to omit any digit of the microseconds part, an implicit specification of 0 is assumed. Thus, 1991-3-2-8.30.00 is equivalent to 1991-03-02-08.30.00.000000.

MBCS Considerations

Date and time stamp strings must contain only single-byte characters and digits.

Date and Time Formats

The character string representation of date and time formats is the default format of datetime values associated with the country code of the application. This default format can be overridden by specifying the "F" format option when the program is precompiled or bound to the database.

Following is a description of the input and output formats for date and time:


Table 94. Date and Time Formats by Country Code
Country Code Local Date Format Local Time Format Default Output Date Format Input Date Formats
355 Albania yyyy-mm-dd JIS LOC LOC, USA, EUR, ISO
785 Arabic dd/mm/yyyy JIS LOC LOC, EUR, ISO
001 Australia (1) mm-dd-yyyy JIS LOC LOC, USA, EUR, ISO
061 Australia dd-mm-yyyy JIS LOC LOC, USA, EUR, ISO
032 Belgium dd/mm/yyyy JIS LOC LOC, EUR, ISO
055 Brazil dd.mm.yyyy JIS LOC LOC, EUR, ISO
359 Bulgaria dd.mm.yyyy JIS EUR LOC, USA, EUR, ISO
001 Canada mm-dd-yyyy JIS USA LOC, USA, EUR, ISO
002 Canada (French) dd-mm-yyyy ISO ISO LOC, USA, EUR, ISO
385 Croatia yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
042 Czech Republic yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
045 Denmark dd-mm-yyyy ISO ISO LOC, USA, EUR, ISO
358 Finland dd/mm/yyyy ISO EUR LOC, EUR, ISO
389 FYR Macedonia dd.mm.yyyy JIS EUR LOC, USA, EUR, ISO
033 France dd/mm/yyyy JIS EUR LOC, EUR, ISO
049 Germany dd/mm/yyyy ISO ISO LOC, EUR, ISO
030 Greece dd/mm/yyyy JIS LOC LOC, EUR, ISO
036 Hungary yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
354 Iceland dd-mm-yyyy JIS LOC LOC, USA, EUR, ISO
091 India dd/mm/yyyy JIS LOC LOC, EUR, ISO
972 Israel dd/mm/yyyy JIS LOC LOC, EUR, ISO
039 Italy dd/mm/yyyy JIS LOC LOC, EUR, ISO
081 Japan mm/dd/yyyy JIS ISO LOC, USA, EUR, ISO
082 Korea mm/dd/yyyy JIS ISO LOC, USA, EUR, ISO
001 Latin America (1) mm-dd-yyyy JIS LOC LOC, USA, EUR, ISO
003 Latin America dd-mm-yyyy JIS LOC LOC, EUR, ISO
031 Netherlands dd-mm-yyyy JIS LOC LOC, USA, EUR, ISO
047 Norway dd/mm/yyyy ISO EUR LOC, EUR, ISO
048 Poland yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
351 Portugal dd/mm/yyyy JIS LOC LOC, EUR, ISO
086 People's Republic of China mm/dd/yyyy JIS ISO LOC, USA, EUR, ISO
040 Romania yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
007 Russia dd/mm/yyyy ISO LOC LOC, EUR, ISO
381 Serbia/ Montenegro yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
042 Slovakia yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
386 Slovenia yyyy-mm-dd JIS ISO LOC, USA, EUR, ISO
034 Spain dd/mm/yyyy JIS LOC LOC, EUR, ISO
046 Sweden dd/mm/yyyy ISO ISO LOC, EUR, ISO
041 Switzerland dd/mm/yyyy ISO EUR LOC, EUR, ISO
088 Taiwan mm-dd-yyyy JIS ISO LOC, USA, EUR, ISO
066 Thailand (2) dd/mm/yyyy JIS LOC LOC, EUR, ISO
090 Turkey dd/mm/yyyy JIS LOC LOC, EUR, ISO
044 UK dd/mm/yyyy JIS LOC LOC, EUR, ISO
001 USA mm-dd-yyyy JIS USA LOC, USA, EUR, ISO
084 Vietnam dd/mm/yyyy JIS LOC LOC, EUR, ISO

Notes:

  1. Countries using the default C locale are assigned country code 001.

  2. yyyy in Buddhist era is equivalent to Gregorian + 543 years (Thailand only).


[ Top of Page | Previous Page | Next Page ]