WebSphere Business Integration products provide bidirectional support for languages with a bidirectional script. A bidirectional script contains both text that is written from right to left and embedded numbers or segments of text in western scripts, (for example Latin-based scripts such as English, French, Cyrillic-based, or Greek) that are written from left to right.
Arabic and Hebrew are the two major language groups that use bidirectional scripts. The Arabic script group includes Arabic, Farsi (Persian), Urdu, as well as other languages. The Hebrew script group includes not only Hebrew, but also Yiddish and Ladino. Because both language groups have alphabets (only 27 characters), they can accommodate a single-byte encoding scheme.
There are two main characteristics that distinguish bidirectional script from a Western language script (such as English, French, German, Greek and others). These two characteristics are bidirectionality and shaping.
Bidirectionality consists of seven key concepts:
FRED DOES NOT BELIEVE taht yas syawla i.
This sentence has one meaning when it is read from left to right (Fred does not believe I always say that), and another meaning when read from right to left, (I always say that Fred does not believe). Because the global orientation is not always obvious from the context, the application developer must be aware of how the text will be read (left to right or right to left).
When using text order in bidirectional scripts, physical and logical order are also important. Physical order refers is how the text segment is physically presented and logical order refers to how text script segments are typed (or pronounced if read aloud). Depending on the situation, some segments may need to be re-ordered in either logical or physical order. For example, the statement:
my wife's name is ILIN
Overall, this statement has a left-to-right orientation. The reader reads the text with the first letter being m, followed by, y, and so forth. In the physical order, the letters i and s are followed by the letter I of the segment containing ILIN, but in Hebrew, ILIN is pronounced NILI, therefore, in the logical order, the first letter of the name segment, is N not I.
Visual text-type is the oldest form of recording, and is a simple copy of the entire screen. This form of text-type is dependent on the programmer knowing where the embedded segments are located and processing them accordingly. Many legacy applications and their files use this type of text.
Implicit text-type recording assumes that the letters of the Latin alphabet have inherent left-to-right directionality, and that Arabic, Persian, Urdu, and Hebrew alphabets have inherent right-to-left directionality. To accommodate bidirectionality, an algorithm of implicit text processing is used to recognize segments based on their inherent directional characteristics, allowing segment inversion to be performed automatically. The main limitation of implicit text-typing is the inability to handle strings where numbers and letters (in both left-to-right and right-to-left) are mixed, such as in the case of part numbers.
Explicit text-type recording assumes that there are additional control characters embedded in a text string that directs the explicit algorithm to perform segment inversions, shaping or numeral selections, and other transformations. The limitation of explicit text-typing is needed for automatic processes to handle embedded controls. There is a specific technique that bridges implicit and explicit text-typing. This technique is the basic display algorithm that is defined in the Unicode Standard Bidi algorithm.
Widget mirroring of a translated GUI mirrors the GUI to match the
directionality of the language. For example, Widget mirroring can move
the menu buttons and navigation tree to the right instead of the left.
Otherwise the frames and windows are not mirrored. Figure 44 shows widget mirroring of a drop-down menu.
Figure 44. Widget-mirrored window showing bidirectional labels
Shaping is a characteristic of many complex languages, particularly the cursive languages Arabic and Hebrew. A writing system is cursive if it has adjacent characters in a word connected to each other, and is more suited to handwriting than to printing. In Arabic, for example, some letters can only connect to the letter on their right. In addition, letters can assume different shapes according to their position in the word and the connective properties of adjacent letters. These points make shaping important in rendering bidirectional text intelligible. The shaping process renders characters to their appropriate presentation forms by replacing an abstract representation of a character with the proper shape. This is accomplished by using the base form of a character to allow the selection of a particular cursive character without specifying its shape.
The proper shape of a character is then selected by a shape determination routine that allows for automatic (algorithmic) selection of the appropriate shape according to the context directed by either the software or the user. In most cases, the basic shapes of a cursive language text are stored. There are two other characteristics that make up shaping. They are character composition and national numbers.
Character composition is defined as the correspondence between the number of text characters stored with the number of text characters presented. To maintain correspondence, devices such as ligatures and diacritics are used. Ligatures are used when two or more characters can be represented by a single character that occupies one presentation cell. Diacritics in bidirectional scripts are marks located in a certain orientation to a consonant, (either above, within, below or near it), to represent vowels. When these marks are stored, they occupy physical positions, but if they are used for representation, they can occupy the same cell as the associated consonants. In Arabic, spacing diacritics are currently implemented as separate characters that are rendered following the character to which the diacritics belong.
National numbers also need special treatment because they are used differently in different languages. For example, in Hebrew, numbers are represented using Arabic digits (1,2,3...0). However, cursive languages such as Arabic, Farsi and Urdu have their own national glyphs to represent digits. The label for digits used in cursive languages is either Hindi or Arabic-Indic digits. Whether in Arabic, Hindi or Arabic-Indic, all simple numbers are presented left to right, but mathematical formulas can differ from language to language. For example, in Arabic, mathematical formulas are written left to right, but in Persian, they are written right to left. Therefore, while numbers are usually encoded as Arabic digits, they can be presented as either nation-specific glyphs or Arabic digits according to the intent of the user or developer.