The database manager does not, in general, restrict the character set available to an application. For a detailed explanation of multi-byte character sets (MBCS) supported by DB2, refer to the Application Development Guide.
The basic character set that can be used in database names consists of the single-byte uppercase and lowercase Latin letters (A...Z, a...z), the Arabic numerals (0...9) and the underscore character (_). This list is augmented with three special characters (#, @, and $) to provide compatibility with host database products. However, these special characters should be used with care in an NLS environment, because they are not included in the NLS host (EBCDIC) invariant character set.
When naming database objects (such as tables and views), program labels, host variables, cursors, and elements from the extended character set (for example, letters with diacritical marks) can also be used. Precisely which characters are available depends on the code page in use. If you are using the database in a multiple code page environment, you must ensure that all code pages support any elements from the extended character set that you plan to use. For information about delimited identifiers that have characters outside of the extended character set, but which can be used in SQL statements, refer to the SQL Reference.
In DBCS environments, the extended character set consists of all the characters in the basic character set, plus the following:
The coding of SQL statements is not language dependent. SQL keywords can be typed in uppercase, lowercase, or mixed case. The names of database objects and host variables, as well as program labels in an SQL statement cannot contain characters that are outside of the extended character set as described above.
The following BiDi attributes are required for correct handling of bidirectional data on different platforms:
- Text type (LOGICAL or VISUAL) - Shaping (SHAPED or UNSHAPED) - Orientation (RIGHT-TO-LEFT or LEFT-TO-RIGHT) - Numeral shape (ARABIC or HINDI) - Symmetric swapping (YES or NO)
Because default values on different platforms are not the same, problems can occur when DB2 data is moved from one platform to another. For example, the Windows operating system uses LOGICAL UNSHAPED data, while OS/390 usually uses SHAPED VISUAL data. Therefore, without support for bidirectional attributes, data sent from DB2 Universal Database for OS/390 to DB2 UDB on Windows 32-bit operating systems may display incorrectly.
DB2 supports bidirectional data attributes through special bidirectional Coded Character Set Identifiers (CCSIDs). The following bidirectional CCSIDs have been defined and are implemented with DB2 UDB:
CCSID | CCSID | Code | String (dec) | (hex) | Page | Type -------+--------+--------+-------- 00420 x'01A4' 420 4 00424 x'01A8' 424 4 08612 x'21A4' 420 5 08616 x'21A8' 424 10 00856 x'0358' 856 5 00862 x'035E' 862 4 00864 x'0360' 864 5 00916 x'0394' 916 5 01046 x'0416' 1046 5 01089 x'0441' 1089 5 01255 x'04E7' 1255 5 01256 x'04E8' 1256 5 62208 x'F300' 856 4 62209 x'F301' 862 10 62210 x'F302' 916 4 62211 x'F303' 424 5 62213 x'F305' 862 5 62215 x'F307' 1255 4 62218 x'F30A' 864 4 62220 x'F30C' 856 6 62221 x'F30D' 862 6 62222 x'F30E' 916 6 62223 x'F30F' 1255 6 62224 x'F310' 420 6 62225 x'F311' 864 6 62226 x'F312' 1046 6 62227 x'F313' 1089 6 62228 x'F314' 1256 6 62229 x'F315' 424 8 62230 x'F316' 856 8 62231 x'F317' 862 8 62232 x'F318' 916 8 62233 x'F319' 420 8 62234 x'F31A' 420 9 62235 x'F31B' 424 6 62236 x'F31C' 856 10 62237 x'F31D' 1255 8 62238 x'F31E' 916 10 62239 x'F31F' 1255 10 62240 x'F320' 424 11 62241 x'F321' 856 11 62242 x'F322' 862 11 62243 x'F323' 916 11 62244 x'F324' 1255 11 62245 x'F325' 424 10 62246 x'F326' 1046 8 62247 x'F327' 1046 9 62248 x'F328' 1046 4 62249 x'F329' 1046 12 62250 x'F32A' 420 12
where CDRA string types are defined as:
String | Text | Numerical | Orientation | Shaping | Symmetrical Type | Type | Shape | | | Swapping --------+-------+------------+-------------+-----------+------------- 4 Visual Passthru LTR Shaped OFF 5 Implicit Arabic LTR Unshaped ON 6 Implicit Arabic RTL Unshaped ON 7(*) Visual Arabic Contextual(*) Unshaped-Lig OFF 8 Visual Arabic RTL Shaped OFF 9 Visual Passthru RTL Shaped ON 10 Implicit Passthru Contextual-L Unshaped ON 11 Implicit Passthru Contextual-R Unshaped ON 12 Implicit Arabic RTL Shaped ON
Note: | (*) Field orientation is left-to-right (LTR) when the first alphabetic character is a Latin character, and right-to-left (RTL) when it is a bidirectional (RTL) character. Characters are unshaped, but LamAlef ligatures are kept, and are not broken into constituents. |
Bidirectional layout transformations are implemented in DB2 Universal Database using the new CCSID definitions. For the new BiDi-specific CCSIDs, layout transformations are performed instead of, or in addition to, code page conversions. To use this support, the DB2BIDI registry variable must be set to YES. By default, this variable is not set. It is used by the server for all conversions, and can only be set when the server is started. Setting DB2BIDI to YES may have some performance impact because of additional checking and layout transformations.
To specify a particular bidirectional CCSID in a non-DRDA environment, select the CCSID (from the above table) that matches the characteristics of your client, and set DB2CODEPAGE to that value. If you already have a connection to the database, you must issue a TERMINATE command, and then reconnect to allow the new setting for DB2CODEPAGE to take effect. If you select a CCSID that is not appropriate for the code page or string type of your client platform, you may get unexpected results. If you select an incompatible CCSID (for example, the Hebrew CCSID for connection to an Arabic database), or if DB2BIDI has not been set for the server, you will receive an error message when you try to connect.
For DRDA environments, if the HOST EBCDIC platform also supports these bidirectional CCSIDs, you only need to set the DB2CODEPAGE value. However, if the HOST platform does not support these CCSIDs, you must also specify a CCSID override for the HOST database server to which you are connecting. This is necessary because, in a DRDA environment, code page conversions and layout transformations are performed by the receiver of data. However, if the HOST server does not support these bidirectional CCSIDs, it does not perform layout transformation on the data that it receives from DB2 UDB. If you use a CCSID override, the DB2 UDB client performs layout transformation on the outbound data as well. For information about setting a CCSID override, refer to the DB2 Connect User's Guide.
CCSID override is not supported for cases where the HOST EBCDIC platform is the client, and DB2 UDB is the server.
When data is exchanged between DB2 Connect and a database on the server, it is usually the receiver that performs conversion on the incoming data. The same convention would normally apply to bidirectional layout transformations, and is in addition to the usual code page conversion. DB2 Connect has the optional ability to perform bidirectional layout transformation on data it is about to send to the server database, in addition to data received from the server database.
In order for DB2 Connect to perform bidirectional layout transformation on outgoing data for a server database, the bidirectional CCSID of the server database must be overridden. This is accomplished through the use of the BIDI parameter in the PARMS field of the DCS database directory entry for the server database.
Note: | If you want DB2 Connect to perform layout transformation on the data it is about to send to the DB2 host database, even though you do not have to override its CCSID, you must still add the BIDI parameter to the PARMS field of the DCS database directory. In this case, the CCSID that you should provide is the default DB2 host database CCSID. |
The BIDI parameter is to be specified as the ninth parameter in the PARMS field, along with the bidirectional CCSID with which you want to override the default server database bidirectional CCSID:
",,,,,,,,BIDI=xyz"
where xyz is the CCSID override.
Note: | The registry variable DB2BIDI must be set to YES for the BIDI parameter to take effect. |
A list of the bidirectional CCSIDs that are supported, along with their string types, can be found in Bidirectional-specific CCSIDs.
The use of this feature is best described with an example.
Suppose you have a Hebrew DB2 client running CCSID 62213 (bidirectional string type 5), and you want to access a DB2 host database running CCSID 00424 (bidirectional string type 4). However, you know that the data contained in the DB2 host database is based on CCSID 08616 (bidirectional string type 6).
There are two problems here: The first is that the DB2 host database does not know the difference in the bidirectional string types with CCSIDs 00424 and 08616. The second problem is that the DB2 host database does not recognize the DB2 client CCSID (62213). It only supports CCSID 00862, which is based on the same code page as CCSID 62213.
You will need to ensure that data sent to the DB2 host database is in bidirectional string type 6 format to begin with, and also let DB2 Connect know that it has to perform bidirectional transformation on data it receives from the DB2 host database. You will need to use following catalog command for the DB2 host database:
db2 catalog dcs database nydb1 as telaviv parms ",,,,,,,,BIDI=08616"
This command tells DB2 Connect to override the DB2 host database CCSID of 00424 with 08616. This override includes the following processing:
Note: | In some cases, use of a bidirectional CCSID may cause the SQL query itself to
be modified in such a way that it is not recognized by the DB2 server.
Specifically, you should avoid using IMPLICIT CONTEXTUAL and IMPLICIT
RIGHT-TO-LEFT CCSIDs when a different string type can be used.
CONTEXTUAL CCSIDs can produce unpredictable results if the SQL query contains
quoted strings. Avoid using quoted strings in SQL statements; use
host variables whenever possible.
If a specific bidirectional CCSID is causing problems that cannot be rectified by following these recommendations, set DB2BIDI to NO. |
The database manager compares character data using a collating sequence. This is an ordering for a set of characters that determines whether a particular character sorts higher, lower, or the same as another. For example, a collating sequence can be used to indicate that lowercase and uppercase versions of a particular character are to be sorted equally.
The collating sequence is specified at database creation time, and cannot be modified later.
The database manager allows databases to be created with custom collating sequences, using the application programming interface (API). For information about implementing a custom collating sequence table, refer to the Application Development Guide.
Note: | Character string data defined with the FOR BIT DATA attribute, and BLOB data, is sorted using the binary sort sequence. |
Once a collating sequence is defined, all future character comparisons for that database will be performed with that collating sequence. Except for character data defined as FOR BIT DATA or BLOB data, the collating sequence will be used for all SQL comparisons and ORDER BY clauses, and also in setting up indexes and statistics. For more information about how the database collating sequence is used, see "String Comparisons" in the SQL Reference.
Potential problems can occur in the following cases:
A final point to remember is that the results of any sort based on a direct comparison of character code points will only match query results that are ordered using an identity collating sequence.
Your choice of database collating sequence can affect federated system performance. If a data source uses the same collating sequence as the DB2 federated database, DB2 can push down order-dependent processing involving character data to the data source. If a data source collating sequence does not match the DB2 collating sequence, data is retrieved, and all order-dependent processing on character data is done locally (this can reduce performance).
To determine whether a data source and DB2 have the same collating sequence, consider the following:
The collating sequence is related to the language supported on a server. Compare DB2 NLS information to data source NLS information.
Some data sources are created using case-insensitive collating sequences, which can yield results that are different from DB2 in order-dependent operations.
Some data sources provide multiple options for collating sequences, or allow the collating sequence to be customized.
Choose the collating sequence for a DB2 federated database based on the mix of data sources that will be accessed from that database. For example:
For information about setting up an MVS collating sequence, refer to the Administrative API Reference for examples under the description of the sqlecrea - Create Database API. These examples contain collation tables for the EBCIDIC 500, 37, and 5026/5035 code pages.
After you set the collating sequence for the DB2 database, ensure that you set the collating_sequence server option for each data source server. This option specifies whether the collating sequence of a given data source server matches the collating sequence of the DB2 database.
Set the collating_sequence option to "Y" if the collating sequences match. This setting allows the DB2 optimizer to consider order-dependent processing at a data source, which can improve performance. However, if the data source collating sequence is not the same as the DB2 database collating sequence, you may receive incorrect results. For example, if your plan uses merge joins, the DB2 optimizer will push down ordering operations to the data sources as much as possible. If the data source collating sequence is not the same, the join result set may not be correct.
Set the collating_sequence option to "N" if the collating sequences do not match. Use this value when data source collating sequences differ from DB2, or when the data source collating operations might be case insensitive. For example, in a case-insensitive data source with an English code page, TOLLESON, ToLLeSoN, and tolleson would all be considered equal. Set the collating_sequence option to "N" if you are not sure whether the collating sequence at the data source is identical to the DB2 collating sequence.
The datetime data types are described below. Although datetime values can be used in certain arithmetic and string operations, and are compatible with certain strings, they are neither strings nor numbers.
A date is a three-part value (year, month, and day). The range of the year part is 0001 to 9999. The range of the month part is 1 to 12. The range of the day part is 1 to x, where x depends on the month.
The internal representation of a date is a string of 4 bytes. Each byte consists of 2 packed decimal digits. The first 2 bytes represent the year, the third byte the month, and the last byte the day.
The length of a DATE column, as described in the SQLDA, is 10 bytes, which is the appropriate length for a character string representation of the value.
A time is a three-part value (hour, minute, and second) designating a time of day under a 24-hour clock. The range of the hour part is 0 to 24. The range of the other parts is 0 to 59. If the hour is 24, the minute and second specifications are zero.
The internal representation of a time is a string of 3 bytes. Each byte is 2 packed decimal digits. The first byte represents the hour, the second byte the minute, and the last byte the second.
The length of a TIME column, as described in the SQLDA, is 8 bytes, which is the appropriate length for a character string representation of the value.
A time stamp is a seven-part value (year, month, day, hour, minute, second, and microsecond) designating a date and a time of day as defined above, except that the time includes the specification of microseconds.
The internal representation of a time stamp is a string of 10 bytes. Each byte is 2 packed decimal digits. The first 4 bytes represent the date, the next 3 bytes the time, and the last 3 bytes the microseconds.
The length of a TIMESTAMP column, as described in the SQLDA, is 26 bytes, which is the appropriate length for a character string representation of the value.
Values whose data types are DATE, TIME, or TIMESTAMP are represented in an internal form that is transparent to the SQL user. Dates, times, and time stamps can also, however, be represented by character strings, and these representations directly concern the SQL user, because there are no constants or variables whose data types are DATE, TIME, or TIMESTAMP. Thus, to be retrieved, a datetime value must be assigned to a character string variable. The character string representation is normally the default format of datetime values associated with the country code of the client, unless overridden by specification of the "F" format option when the program is precompiled or bound to the database. For a list of the string formats for the various country codes, see Table 94.
When a valid string representation of a datetime value is used in an operation with an internal datetime value, the string representation is converted to the internal form of the date, time, or time stamp before the operation is performed. Valid string representations of datetime values are defined in the following sections.
A string representation of a date is a character string that starts with a digit and has a length of at least 8 characters. Trailing blanks may be included; leading zeros may be omitted from the month part and the day part of the date.
Valid string formats for dates are listed in Table 92. Each format is identified by name, and includes an
associated abbreviation and an example of its use.
Table 92. Formats for String Representations of Dates
Format Name | Abbreviation | Date Format | Example |
---|---|---|---|
International Standards Organization | ISO | yyyy-mm-dd | 1991-10-27 |
IBM USA standard | USA | mm/dd/yyyy | 10/27/1991 |
IBM European standard | EUR | dd.mm.yyyy | 27.10.1991 |
Japanese Industrial Standard Christian era | JIS | yyyy-mm-dd | 1991-10-27 |
Site-defined (Local) | LOC | Depends on database country code | -- |
A string representation of a time is a character string that starts with a digit and has a length of at least 4 characters. Trailing blanks may be included; a leading zero may be omitted from the hour part of the time, and seconds may be omitted entirely. If you choose to omit seconds, an implicit specification of 0 seconds is assumed. Thus, 13.30 is equivalent to 13.30.00.
Valid string formats for times are listed in Table 93. Each format is identified by name, and includes an
associated abbreviation and an example of its use.
Table 93. Formats for String Representations of Times
Format Name | Abbreviation | Time Format | Example |
---|---|---|---|
International Standards Organization | ISO | hh.mm.ss | 13.30.05 |
IBM USA standard | USA | hh:mm AM or PM | 1:30 PM |
IBM European standard | EUR | hh.mm.ss | 13.30.05 |
Japanese Industrial Standard Christian Era | JIS | hh:mm:ss | 13:30:05 |
Site-defined (Local) | LOC | Depends on application country code | -- |
Notes:
A string representation of a time stamp is a character string that starts with a digit and has a length of at least 16 characters. The complete string representation of a time stamp has the form yyyy-mm-dd-hh.mm.ss.nnnnnn. Trailing blanks may be included; leading zeros may be omitted from the month, day, or hour part of the time stamp, and microseconds may be truncated or omitted entirely. If you choose to omit any digit of the microseconds part, an implicit specification of 0 is assumed. Thus, 1991-3-2-8.30.00 is equivalent to 1991-03-02-08.30.00.000000.
Date and time stamp strings must contain only single-byte characters and digits.
The character string representation of date and time formats is the default format of datetime values associated with the country code of the application. This default format can be overridden by specifying the "F" format option when the program is precompiled or bound to the database.
Following is a description of the input and output formats for date and time:
Table 94. Date and Time Formats by Country Code
Country Code | Local Date Format | Local Time Format | Default Output Date Format | Input Date Formats |
---|---|---|---|---|
355 Albania | yyyy-mm-dd | JIS | LOC | LOC, USA, EUR, ISO |
785 Arabic | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
001 Australia (1) | mm-dd-yyyy | JIS | LOC | LOC, USA, EUR, ISO |
061 Australia | dd-mm-yyyy | JIS | LOC | LOC, USA, EUR, ISO |
032 Belgium | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
055 Brazil | dd.mm.yyyy | JIS | LOC | LOC, EUR, ISO |
359 Bulgaria | dd.mm.yyyy | JIS | EUR | LOC, USA, EUR, ISO |
001 Canada | mm-dd-yyyy | JIS | USA | LOC, USA, EUR, ISO |
002 Canada (French) | dd-mm-yyyy | ISO | ISO | LOC, USA, EUR, ISO |
385 Croatia | yyyy-mm-dd | JIS | ISO | LOC, USA, EUR, ISO |
042 Czech Republic | yyyy-mm-dd | JIS | ISO | LOC, USA, EUR, ISO |
045 Denmark | dd-mm-yyyy | ISO | ISO | LOC, USA, EUR, ISO |
358 Finland | dd/mm/yyyy | ISO | EUR | LOC, EUR, ISO |
389 FYR Macedonia | dd.mm.yyyy | JIS | EUR | LOC, USA, EUR, ISO |
033 France | dd/mm/yyyy | JIS | EUR | LOC, EUR, ISO |
049 Germany | dd/mm/yyyy | ISO | ISO | LOC, EUR, ISO |
030 Greece | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
036 Hungary | yyyy-mm-dd | JIS | ISO | LOC, USA, EUR, ISO |
354 Iceland | dd-mm-yyyy | JIS | LOC | LOC, USA, EUR, ISO |
091 India | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
972 Israel | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
039 Italy | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
081 Japan | mm/dd/yyyy | JIS | ISO | LOC, USA, EUR, ISO |
082 Korea | mm/dd/yyyy | JIS | ISO | LOC, USA, EUR, ISO |
001 Latin America (1) | mm-dd-yyyy | JIS | LOC | LOC, USA, EUR, ISO |
003 Latin America | dd-mm-yyyy | JIS | LOC | LOC, EUR, ISO |
031 Netherlands | dd-mm-yyyy | JIS | LOC | LOC, USA, EUR, ISO |
047 Norway | dd/mm/yyyy | ISO | EUR | LOC, EUR, ISO |
048 Poland | yyyy-mm-dd | JIS | ISO | LOC, USA, EUR, ISO |
351 Portugal | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
086 People's Republic of China | mm/dd/yyyy | JIS | ISO | LOC, USA, EUR, ISO |
040 Romania | yyyy-mm-dd | JIS | ISO | LOC, USA, EUR, ISO |
007 Russia | dd/mm/yyyy | ISO | LOC | LOC, EUR, ISO |
381 Serbia/ Montenegro | yyyy-mm-dd | JIS | ISO | LOC, USA, EUR, ISO |
042 Slovakia | yyyy-mm-dd | JIS | ISO | LOC, USA, EUR, ISO |
386 Slovenia | yyyy-mm-dd | JIS | ISO | LOC, USA, EUR, ISO |
034 Spain | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
046 Sweden | dd/mm/yyyy | ISO | ISO | LOC, EUR, ISO |
041 Switzerland | dd/mm/yyyy | ISO | EUR | LOC, EUR, ISO |
088 Taiwan | mm-dd-yyyy | JIS | ISO | LOC, USA, EUR, ISO |
066 Thailand (2) | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
090 Turkey | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
044 UK | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
001 USA | mm-dd-yyyy | JIS | USA | LOC, USA, EUR, ISO |
084 Vietnam | dd/mm/yyyy | JIS | LOC | LOC, EUR, ISO |
Notes:
|