SQL Reference

Character Conversion

A string is a sequence of bytes that may represent characters. Within a string, all the characters are represented by a common coding representation. In some cases, it might be necessary to convert these characters to a different coding representation. The process of conversion is known as character conversion. 12

Character conversion can occur when an SQL statement is executed remotely. Consider, for example, these two cases:

In either case, the string could have a different representation at the sending and receiving systems. Conversion can also occur during string operations on the same system.

The following list defines some of the terms used when discussing character conversion.

character set
A defined set of characters.

For example, the following character set appears in several code pages:

code page
A set of assignments of characters to code points.

In the ASCII encoding scheme for code page 850, for example, 'A' is assigned code point X'41' and 'B' is assigned code point X'42'. Within a code page, each code point has only one specific meaning. A code page is an attribute of the database. When an application program connects to the database, the database manager determines the code page of the application.

code point
A unique bit pattern that represents a character.

encoding scheme
A set of rules used to represent character data.

For example:

Character Sets and Code Pages

The following example shows how a typical character set might map to different code points in two different code pages.



REQTEXT

Even with the same encoding scheme, there are many different code pages, and the same code point can represent a different character in different code pages. Furthermore, a byte in a character string does not necessarily represent a character from a single-byte character set (SBCS). Character strings are also used for mixed and bit data. Mixed data is a mixture of single-byte, double-byte, or multi-byte characters. Bit data (columns defined as FOR BIT DATA or BLOBs, or binary strings) is not associated with any character set.

Code Page Attributes

The database manager determines code page attributes for all character strings when an application is bound to a database. The potential code page attributes are:

The Database Code Page
The database code page stored in the database configuration files. This code page value is determined when the database is created and cannot be altered.

The Application Code Page
The code page under which the application is executed. Note that this is not necessarily the same code page under which the application was bound. (See the Application Development Guide for further information on binding and executing application programs.)

Code Page 0
This represents a string that is derived from an expression that contains a FOR BIT DATA or BLOB value.

String Code Page Attributes

Character string code page attributes are as follows:

A set of rules is used to determine the code page attributes for operations that combine string objects, such as the results of scalar operations, concatenation, or set operations. At execution time, code page attributes are used to determine any requirements for code page conversions of strings.

For more details on character conversion, see:


Footnotes:

12
Character conversion, when required, is automatic and is transparent to the application when it is successful. A knowledge of conversion is therefore unnecessary when all the strings involved in a statement's execution are represented in the same way. This is frequently the case for stand-alone installations and for networks within the same country. Thus, for many readers, character conversion may be irrelevant.


[ Top of Page | Previous Page | Next Page ]