Bidirectional script support

WebSphere Business Integration products provide bidirectional support for languages with a bidirectional script. A bidirectional script contains both text that is written from right to left and embedded numbers or segments of text in western scripts, (for example Latin-based scripts such as English, French, Cyrillic-based, or Greek) that are written from left to right.

Arabic and Hebrew are the two major language groups that use bidirectional scripts. The Arabic script group includes Arabic, Farsi (Persian), Urdu, as well as other languages. The Hebrew script group includes not only Hebrew, but also Yiddish and Ladino. Because both language groups have alphabets (only 27 characters), they can accommodate a single-byte encoding scheme.

Bidirectional language characteristics

There are two main characteristics that distinguish bidirectional script from a Western language script (such as English, French, German, Greek and others). These two characteristics are bidirectionality and shaping.

Bidirectionality

Bidirectionality consists of seven key concepts:

Note:
In all examples shown below, capitalized letters such as, DCBA are used to represent Arabic or Hebrew letters.

Segmentation
Segmentation is defined as a string that has one portion of text within a string that has a distinct directionality. Therefore, a script can have the main portion consisting in a right-to-left orientation while another portion(s) has a left-to-right orientation. An example of bidirectional segmentation is the street address, Entrance B 25 Maple Street. this address when written in Hebrew is: B ECNARTNE 25 TEERTS ELPAM. In this example, the major parts of the string text, B ECNARTNE and TEERTS ELPAM, have a right-to-left orientation but the number 25 has a left-to-right orientation.

Nesting
Nesting is defined as a text segment that has one directionality while also having within that segment an additional segment with an opposite directionality. Again, using the street address, B ECNARTNE 25 TEERTS ELPAM. This address has one level of nesting. The street name, TEERTS ELPAM, is written with a right-to-left orientation, but the flow is then reversed to allow the correct entry of the street number 25that has a left-to-right orientation. After the street number, the flow orientation is reversed again to right to left for B ECNARTNE.

Global orientation
Global orientation, which is also referred to as Writing Order, Reading Order, or Paragraph Direction designates the side of the screen, window, or page where the writing of the text has started. In addition, global orientation is context dependent. This means that text meaning is dependent on its context. An example, of context dependency is demonstrated using a sentence written as a bidirectional script. The sentence reads:
FRED DOES NOT BELIEVE taht yas syawla i.

This sentence has one meaning when it is read from left to right (Fred does not believe I always say that), and another meaning when read from right to left, (I always say that Fred does not believe). Because the global orientation is not always obvious from the context, the application developer must be aware of how the text will be read (left to right or right to left).

Physical and logical text ordering
Bidirectional text can be stored in either logical or physical order. In workstation environments, the preferred means for entering and processing bidirectional text is logical order because text is processed similarly to Latin text. When using logical order in storage, you must provide the means to reverse segments whose direction is opposite to the global orientation. For example, if global orientation is English (left to right), then segments in Arabic and Hebrew need special processing to appear in their native right-to-left direction. Conversely, the preferred way to store bidirectional text in mainframes for entering and processing bidirectional text is in physical order. Therefore, when integrating bidirectional text from mainframe and workstation environments you must transform the text to a layout where all the text has the same text order.

When using text order in bidirectional scripts, physical and logical order are also important. Physical order refers is how the text segment is physically presented and logical order refers to how text script segments are typed (or pronounced if read aloud). Depending on the situation, some segments may need to be re-ordered in either logical or physical order. For example, the statement:

my wife's name is ILIN

Overall, this statement has a left-to-right orientation. The reader reads the text with the first letter being m, followed by, y, and so forth. In the physical order, the letters i and s are followed by the letter I of the segment containing ILIN, but in Hebrew, ILIN is pronounced NILI, therefore, in the logical order, the first letter of the name segment, is N not I.

Text-type
Text-type is defined as the most appropriate approach for recording a specific text. This means that different text-types require different recording techniques. There are three text-types. used for recording: visual, implicit (logical), and explicit.

Visual text-type is the oldest form of recording, and is a simple copy of the entire screen. This form of text-type is dependent on the programmer knowing where the embedded segments are located and processing them accordingly. Many legacy applications and their files use this type of text.

Implicit text-type recording assumes that the letters of the Latin alphabet have inherent left-to-right directionality, and that Arabic, Persian, Urdu, and Hebrew alphabets have inherent right-to-left directionality. To accommodate bidirectionality, an algorithm of implicit text processing is used to recognize segments based on their inherent directional characteristics, allowing segment inversion to be performed automatically. The main limitation of implicit text-typing is the inability to handle strings where numbers and letters (in both left-to-right and right-to-left) are mixed, such as in the case of part numbers.

Explicit text-type recording assumes that there are additional control characters embedded in a text string that directs the explicit algorithm to perform segment inversions, shaping or numeral selections, and other transformations. The limitation of explicit text-typing is needed for automatic processes to handle embedded controls. There is a specific technique that bridges implicit and explicit text-typing. This technique is the basic display algorithm that is defined in the Unicode Standard Bidi algorithm.

Symmetrical swapping
Symmetrical swapping is the ability to handle characters, such as: <, (,[,{, that have a complementary symmetric character with an opposite directional meaning: >,),],}. These characters are problematic because global swapping would, for example, change A > B to B > A. Symmetrical swapping enables character conversion of the symbol to B < A.

Widget mirroring

Widget mirroring of a translated GUI mirrors the GUI to match the directionality of the language. For example, Widget mirroring can move the menu buttons and navigation tree to the right instead of the left. Otherwise the frames and windows are not mirrored. Figure 44 shows widget mirroring of a drop-down menu.
This figure shows an example of Widget Mirroring in bidirectional languages

Figure 44. Widget-mirrored window showing bidirectional labels

Shaping

Shaping is a characteristic of many complex languages, particularly the cursive languages Arabic and Hebrew. A writing system is cursive if it has adjacent characters in a word connected to each other, and is more suited to handwriting than to printing. In Arabic, for example, some letters can only connect to the letter on their right. In addition, letters can assume different shapes according to their position in the word and the connective properties of adjacent letters. These points make shaping important in rendering bidirectional text intelligible. The shaping process renders characters to their appropriate presentation forms by replacing an abstract representation of a character with the proper shape. This is accomplished by using the base form of a character to allow the selection of a particular cursive character without specifying its shape.

The proper shape of a character is then selected by a shape determination routine that allows for automatic (algorithmic) selection of the appropriate shape according to the context directed by either the software or the user. In most cases, the basic shapes of a cursive language text are stored. There are two other characteristics that make up shaping. They are character composition and national numbers.

Character composition is defined as the correspondence between the number of text characters stored with the number of text characters presented. To maintain correspondence, devices such as ligatures and diacritics are used. Ligatures are used when two or more characters can be represented by a single character that occupies one presentation cell. Diacritics in bidirectional scripts are marks located in a certain orientation to a consonant, (either above, within, below or near it), to represent vowels. When these marks are stored, they occupy physical positions, but if they are used for representation, they can occupy the same cell as the associated consonants. In Arabic, spacing diacritics are currently implemented as separate characters that are rendered following the character to which the diacritics belong.

National numbers also need special treatment because they are used differently in different languages. For example, in Hebrew, numbers are represented using Arabic digits (1,2,3...0). However, cursive languages such as Arabic, Farsi and Urdu have their own national glyphs to represent digits. The label for digits used in cursive languages is either Hindi or Arabic-Indic digits. Whether in Arabic, Hindi or Arabic-Indic, all simple numbers are presented left to right, but mathematical formulas can differ from language to language. For example, in Arabic, mathematical formulas are written left to right, but in Persian, they are written right to left. Therefore, while numbers are usually encoded as Arabic digits, they can be presented as either nation-specific glyphs or Arabic digits according to the intent of the user or developer.

Copyright IBM Corp. 1997, 2004