Defines | |||
![]() | ![]() | #define | DOCXX_TAG |
![]() | ![]() | #define | BIDI_SAMPLE_CODE |
![]() | ![]() | #define | UBIDI_DEFAULT_LTR |
![]() | ![]() | Paragraph level setting. More... | |
![]() | ![]() | #define | UBIDI_DEFAULT_RTL |
![]() | ![]() | Paragraph level setting. More... | |
![]() | ![]() | #define | UBIDI_MAX_EXPLICIT_LEVEL |
![]() | ![]() | Maximum explicit embedding level. More... | |
![]() | ![]() | #define | UBIDI_LEVEL_OVERRIDE |
![]() | ![]() | Bit flag for level input. More... | |
Typedefs | |||
![]() | ![]() | typedef enum UBiDiDirection | UBiDiDirection |
![]() | ![]() | typedef struct UBiDi | UBiDi |
Enumerations | |||
![]() | ![]() | enum | UBiDiDirection { } |
![]() | ![]() | UBiDiDirection values indicate the text direction. | |
Functions | |||
![]() | ![]() | U_CAPI UBiDi* U_EXPORT2 | ubidi_open (void) |
![]() | ![]() | Allocate a UBiDi structure. More... | |
![]() | ![]() | U_CAPI UBiDi* U_EXPORT2 | ubidi_openSized (UTextOffset maxLength, UTextOffset maxRunCount, UErrorCode *pErrorCode) |
![]() | ![]() | Allocate a UBiDi structure with preallocated memory for internal structures. More... | |
![]() | ![]() | U_CAPI void U_EXPORT2 | ubidi_close (UBiDi *pBiDi) |
![]() | ![]() | ubidi_close() must be called to free the memory associated with a UBiDi object.
. More... | |
![]() | ![]() | U_CAPI void U_EXPORT2 | ubidi_setPara (UBiDi *pBiDi, const UChar *text, UTextOffset length, UBiDiLevel paraLevel, UBiDiLevel *embeddingLevels, UErrorCode *pErrorCode) |
![]() | ![]() | Perform the Unicode BiDi algorithm. More... | |
![]() | ![]() | U_CAPI void U_EXPORT2 | ubidi_setLine (const UBiDi *pParaBiDi, UTextOffset start, UTextOffset limit, UBiDi *pLineBiDi, UErrorCode *pErrorCode) |
![]() | ![]() | ubidi_getLine() sets a UBiDi to contain the reordering information, especially the resolved levels, for all the characters in a line of text. More... | |
![]() | ![]() | U_CAPI UBiDiDirection U_EXPORT2 | ubidi_getDirection (const UBiDi *pBiDi) |
![]() | ![]() | Get the directionality of the text. More... | |
![]() | ![]() | U_CAPI UTextOffset U_EXPORT2 | ubidi_getLength (const UBiDi *pBiDi) |
![]() | ![]() | Get the length of the text. More... | |
![]() | ![]() | U_CAPI UBiDiLevel U_EXPORT2 | ubidi_getParaLevel (const UBiDi *pBiDi) |
![]() | ![]() | Get the paragraph level of the text. More... | |
![]() | ![]() | U_CAPI UBiDiLevel U_EXPORT2 | ubidi_getLevelAt (const UBiDi *pBiDi, UTextOffset charIndex) |
![]() | ![]() | Get the level for one character. More... | |
![]() | ![]() | U_CAPI const UBiDiLevel* U_EXPORT2 | ubidi_getLevels (UBiDi *pBiDi, UErrorCode *pErrorCode) |
![]() | ![]() | Get an array of levels for each character.
. More... | |
![]() | ![]() | U_CAPI void U_EXPORT2 | ubidi_getLogicalRun (const UBiDi *pBiDi, UTextOffset logicalStart, UTextOffset *pLogicalLimit, UBiDiLevel *pLevel) |
![]() | ![]() | Get a logical run. More... | |
![]() | ![]() | U_CAPI UTextOffset U_EXPORT2 | ubidi_countRuns (UBiDi *pBiDi, UErrorCode *pErrorCode) |
![]() | ![]() | Get the number of runs. More... | |
![]() | ![]() | U_CAPI UBiDiDirection U_EXPORT2 | ubidi_getVisualRun (UBiDi *pBiDi, UTextOffset runIndex, UTextOffset *pLogicalStart, UTextOffset *pLength) |
![]() | ![]() | Get one run's logical start, length, and directionality, which can be 0 for LTR or 1 for RTL. More... | |
![]() | ![]() | U_CAPI UTextOffset U_EXPORT2 | ubidi_getVisualIndex (UBiDi *pBiDi, UTextOffset logicalIndex, UErrorCode *pErrorCode) |
![]() | ![]() | Get the visual position from a logical text position. More... | |
![]() | ![]() | U_CAPI UTextOffset U_EXPORT2 | ubidi_getLogicalIndex (UBiDi *pBiDi, UTextOffset visualIndex, UErrorCode *pErrorCode) |
![]() | ![]() | Get the logical text position from a visual position. More... | |
![]() | ![]() | U_CAPI void U_EXPORT2 | ubidi_getLogicalMap (UBiDi *pBiDi, UTextOffset *indexMap, UErrorCode *pErrorCode) |
![]() | ![]() | Get a logical-to-visual index map (array) for the characters in the UBiDi (paragraph or line) object. More... | |
![]() | ![]() | U_CAPI void U_EXPORT2 | ubidi_getVisualMap (UBiDi *pBiDi, UTextOffset *indexMap, UErrorCode *pErrorCode) |
![]() | ![]() | Get a visual-to-logical index map (array) for the characters in the UBiDi (paragraph or line) object. More... | |
![]() | ![]() | U_CAPI void U_EXPORT2 | ubidi_reorderLogical (const UBiDiLevel *levels, UTextOffset length, UTextOffset *indexMap) |
![]() | ![]() | This is a convenience function that does not use a UBiDi object. More... | |
![]() | ![]() | U_CAPI void U_EXPORT2 | ubidi_reorderVisual (const UBiDiLevel *levels, UTextOffset length, UTextOffset *indexMap) |
![]() | ![]() | This is a convenience function that does not use a UBiDi object. More... | |
![]() | ![]() | U_CAPI void U_EXPORT2 | ubidi_invertMap (const UTextOffset *srcMap, UTextOffset *destMap, UTextOffset length) |
![]() | ![]() | Invert an index map. More... | |
Variables | |||
![]() | ![]() | DOCXX_TAG typedef uint8_t | UBiDiLevel |
![]() | ![]() | UBiDiLevel is the type of the level values in this BiDi implementation. More... | |
![]() | ![]() | struct | UBiDi |
![]() | ![]() | Forward declaration of the UBiDi structure for the declaration of the API functions. More... |
This is an implementation of the Unicode Bidirectional algorithm. The algorithm is defined in the Unicode Technical Report 9, version 5, also described in The Unicode Standard, Version 3.0 .
In functions with an error code parameter, the pErrorCode
pointer must be valid and the value that it points to must not indicate a failure before the function call. Otherwise, the function returns immediately. After the function call, the value indicates success or failure.
The <quote>limit</quote> of a sequence of characters is the position just after their last character, i.e., one more than that position.
Some of the API functions provide access to <quote>runs</quote>. Such a <quote>run</quote> is defined as a sequence of characters that are at the same embedding level after performing the BIDI algorithm.
#define DOCXX_TAG () |
#define BIDI_SAMPLE_CODE () |
#define UBIDI_DEFAULT_LTR () |
Paragraph level setting.
If there is no strong character, then set the paragraph level to 0 (left-to-right).
#define UBIDI_DEFAULT_RTL () |
Paragraph level setting.
If there is no strong character, then set the paragraph level to 1 (right-to-left).
#define UBIDI_MAX_EXPLICIT_LEVEL () |
Maximum explicit embedding level.
(The maximum resolved level can be up to UBIDI_MAX_EXPLICIT_LEVEL+1
).
#define UBIDI_LEVEL_OVERRIDE () |
Bit flag for level input.
Overrides directional properties.
typedef enum UBiDiDirection UBiDiDirection |
typedef struct UBiDi UBiDi |
enum UBiDiDirection |
UBiDiDirection
values indicate the text direction.
U_CAPI UBiDi *U_EXPORT2 ubidi_open (void) |
Allocate a UBiDi
structure.
Such an object is initially empty. It is assigned the BiDi properties of a paragraph by ubidi_setPara()
or the BiDi properties of a line of a paragraph by ubidi_getLine()
.
This object can be reused for as long as it is not deallocated by calling ubidi_close()
.
ubidi_set()
will allocate additional memory for internal structures as necessary.
UBiDi
object. U_CAPI UBiDi *U_EXPORT2 ubidi_openSized (UTextOffset maxLength, UTextOffset maxRunCount, UErrorCode * pErrorCode) |
Allocate a UBiDi
structure with preallocated memory for internal structures.
This function provides a UBiDi
object like ubidi_open()
with no arguments, but it also preallocates memory for internal structures according to the sizings supplied by the caller.
Subsequent functions will not allocate any more memory, and are thus guaranteed not to fail because of lack of memory.
The preallocation can be limited to some of the internal memory by setting some values to 0 here. That means that if, e.g., maxRunCount
cannot be reasonably predetermined and should not be set to maxLength
(the only failproof value) to avoid wasting memory, then maxRunCount
could be set to 0 here and the internal structures that are associated with it will be allocated on demand, just like with ubidi_open()
.
maxLength |
is the maximum paragraph or line length that internal memory will be preallocated for. An attempt to associate this object with a longer text will fail, unless this value is 0, which leaves the allocation up to the implementation.
|
maxRunCount | is the maximum anticipated number of same-level runs that internal memory will be preallocated for. An attempt to access visual runs on an object that was not preallocated for as many runs as the text was actually resolved to will fail, unless this value is 0, which leaves the allocation up to the implementation. |
The number of runs depends on the actual text and maybe anywhere between 1 and maxLength
. It is typically small.
pErrorCode |
must be a valid pointer to an error code value, which must not indicate a failure before the function call.
|
UBiDi
object with preallocated memory. U_CAPI void U_EXPORT2 ubidi_close (UBiDi * pBiDi) |
ubidi_close()
must be called to free the memory associated with a UBiDi object.
.
Important: If a UBiDi
object is the <quote>child</quote> of another one (its <quote>parent</quote>), after calling ubidi_setLine()
, then the child object must be destroyed (closed) or reused (by calling ubidi_setPara()
or ubidi_setLine()
) before the parent object.
pBiDi |
is a UBiDi object.
|
U_CAPI void U_EXPORT2 ubidi_setPara (UBiDi * pBiDi, const UChar * text, UTextOffset length, UBiDiLevel paraLevel, UBiDiLevel * embeddingLevels, UErrorCode * pErrorCode) |
Perform the Unicode BiDi algorithm.
It is defined in the Unicode Technical Report 9, version 5, also described in The Unicode Standard, Version 3.0 .
This function takes a single plain text paragraph with or without externally specified embedding levels from <quote>styled</quote> text and computes the left-right-directionality of each character.
If the entire paragraph consists of text of only one direction, then the function may not perform all the steps described by the algorithm, i.e., some levels may not be the same as if all steps were performed. This is not relevant for unidirectional text.
For example, in pure LTR text with numbers the numbers would get a resolved level of 2 higher than the surrounding text according to the algorithm. This implementation may set all resolved levels to the same value in such a case.
The text must be externally split into separate paragraphs (rule P1). Paragraph separators (B) should appear at most at the very end.
pBiDi |
A UBiDi object allocated with ubidi_open() which will be set to contain the reordering information, especially the resolved levels for all the characters in text .
|
text |
is a pointer to the single-paragraph text that the BiDi algorithm will be performed on (step (P1) of the algorithm is performed externally). The text must be (at least) length long.
|
length |
is the length of the text; if length==-1 then the text must be zero-terminated.
|
paraLevel |
specifies the default level for the paragraph; it is typically 0 (LTR) or 1 (RTL). If the function shall determine the paragraph level from the text, then paraLevel can be set to either UBIDI_DEFAULT_LTR or UBIDI_DEFAULT_RTL ; if there is no strongly typed character, then the desired default is used (0 for LTR or 1 for RTL). Any other value between 0 and UBIDI_MAX_EXPLICIT_LEVEL is also valid, with odd levels indicating RTL.
|
embeddingLevels |
(in) may be used to preset the embedding and override levels, ignoring characters like LRE and PDF in the text. A level overrides the directional property of its corresponding (same index) character if the level has the UBIDI_LEVEL_OVERRIDE bit set. |
Except for that bit, it must be paraLevel<=embeddingLevels[]<=UBIDI_MAX_EXPLICIT_LEVEL
.
Caution: A copy of this pointer, not of the levels, will be stored in the UBiDi
object; the embeddingLevels
array must not be deallocated before the UBiDi
structure is destroyed or reused, and the embeddingLevels
should not be modified to avoid unexpected results on subsequent BiDi operations. However, the ubidi_setPara()
and ubidi_setLine()
functions may modify some or all of the levels.
After the UBiDi
object is reused or destroyed, the caller must take care of the deallocation of the embeddingLevels
array.
The embeddingLevels
array must be at least length
long.
pErrorCode | must be a valid pointer to an error code value, which must not indicate a failure before the function call. |
U_CAPI void U_EXPORT2 ubidi_setLine (const UBiDi * pParaBiDi, UTextOffset start, UTextOffset limit, UBiDi * pLineBiDi, UErrorCode * pErrorCode) |
ubidi_getLine()
sets a UBiDi
to contain the reordering information, especially the resolved levels, for all the characters in a line of text.
This line of text is specified by referring to a UBiDi
object representing this information for a paragraph of text, and by specifying a range of indexes in this paragraph.
In the new line object, the indexes will range from 0 to limit-start
.
This is used after calling ubidi_setPara()
for a paragraph, and after line-breaking on that paragraph. It is not necessary if the paragraph is treated as a single line.
After line-breaking, rules (L1) and (L2) for the treatment of trailing WS and for reordering are performed on a UBiDi
object that represents a line.
Important: pLineBiDi
shares data with pParaBiDi
. You must destroy or reuse pLineBiDi
before pParaBiDi
. In other words, you must destroy or reuse the UBiDi
object for a line before the object for its parent paragraph.
pParaBiDi |
is the parent paragraph object.
|
start |
is the line's first index into the paragraph text.
|
limit |
is just behind the line's last index into the paragraph text (its last index +1). It must be 0<=start<=limit<= paragraph length.
|
pLineBiDi |
is the object that will now represent a line of the paragraph.
|
pErrorCode |
must be a valid pointer to an error code value, which must not indicate a failure before the function call.
|
U_CAPI UBiDiDirection U_EXPORT2 ubidi_getDirection (const UBiDi * pBiDi) |
Get the directionality of the text.
pBiDi |
is the paragraph or line UBiDi object.
|
UBIDI_XXX
value that indicates if the entire text represented by this object is unidirectional, and which direction, or if it is mixed-directional.
U_CAPI UTextOffset U_EXPORT2 ubidi_getLength (const UBiDi * pBiDi) |
Get the length of the text.
pBiDi |
is the paragraph or line UBiDi object.
|
U_CAPI UBiDiLevel U_EXPORT2 ubidi_getParaLevel (const UBiDi * pBiDi) |
Get the paragraph level of the text.
pBiDi |
is the paragraph or line UBiDi object.
|
U_CAPI UBiDiLevel U_EXPORT2 ubidi_getLevelAt (const UBiDi * pBiDi, UTextOffset charIndex) |
Get the level for one character.
pBiDi |
is the paragraph or line UBiDi object.
|
charIndex |
the index of a character.
|
U_CAPI const UBiDiLevel *U_EXPORT2 ubidi_getLevels (UBiDi * pBiDi, UErrorCode * pErrorCode) |
Get an array of levels for each character.
.
Note that this function may allocate memory under some circumstances, unlike ubidi_getLevelAt()
.
pBiDi |
is the paragraph or line UBiDi object.
|
pErrorCode |
must be a valid pointer to an error code value, which must not indicate a failure before the function call.
|
NULL
if an error occurs.
U_CAPI void U_EXPORT2 ubidi_getLogicalRun (const UBiDi * pBiDi, UTextOffset logicalStart, UTextOffset * pLogicalLimit, UBiDiLevel * pLevel) |
Get a logical run.
This function returns information about a run and is used to retrieve runs in logical order.
This is especially useful for line-breaking on a paragraph.
pBiDi |
is the paragraph or line UBiDi object.
|
logicalStart |
is the first character of the run.
|
pLogicalLimit |
will receive the limit of the run. The l-value that you point to here may be the same expression (variable) as the one for logicalStart . This pointer can be NULL if this value is not necessary.
|
pLevel |
will receive the level of the run. This pointer can be NULL if this value is not necessary. |
U_CAPI UTextOffset U_EXPORT2 ubidi_countRuns (UBiDi * pBiDi, UErrorCode * pErrorCode) |
Get the number of runs.
This function may invoke the actual reordering on the UBiDi
object, after ubidi_setPara()
may have resolved only the levels of the text. Therefore, ubidi_countRuns()
may have to allocate memory, and may fail doing so.
pBiDi |
is the paragraph or line UBiDi object.
|
pErrorCode |
must be a valid pointer to an error code value, which must not indicate a failure before the function call.
|
U_CAPI UBiDiDirection U_EXPORT2 ubidi_getVisualRun (UBiDi * pBiDi, UTextOffset runIndex, UTextOffset * pLogicalStart, UTextOffset * pLength) |
Get one run's logical start, length, and directionality, which can be 0 for LTR or 1 for RTL.
In an RTL run, the character at the logical start is visually on the right of the displayed run. The length is the number of characters in the run.
ubidi_countRuns()
should be called before the runs are retrieved.
pBiDi |
is the paragraph or line UBiDi object.
|
runIndex |
is the number of the run in visual order, in the range [0..ubidi_countRuns(pBiDi)-1] .
|
pLogicalStart |
is the first logical character index in the text. The pointer may be NULL if this index is not needed.
|
pLength |
is the number of characters (at least one) in the run. The pointer may be NULL if this is not needed.
|
UBIDI_LTR==0
or UBIDI_RTL==1
, never UBIDI_MIXED
.
UTextOffset i, count=ubidi_countRuns(pBiDi), logicalStart, visualIndex=0, length; for(i=0; i<count; ++i) { if(UBIDI_LTR==ubidi_getVisualRun(pBiDi, i, &logicalStart, &length)) { do { // LTR show_char(text[logicalStart++], visualIndex++); } while(--length>0); } else { logicalStart+=length; // logicalLimit do { // RTL show_char(text[--logicalStart], visualIndex++); } while(--length>0); } }
Note that in right-to-left runs, code like this places modifier letters before base characters and second surrogates before first ones.
U_CAPI UTextOffset U_EXPORT2 ubidi_getVisualIndex (UBiDi * pBiDi, UTextOffset logicalIndex, UErrorCode * pErrorCode) |
Get the visual position from a logical text position.
If such a mapping is used many times on the same UBiDi
object, then calling ubidi_getLogicalMap()
is more efficient.
Note that in right-to-left runs, this mapping places modifier letters before base characters and second surrogates before first ones.
pBiDi |
is the paragraph or line UBiDi object.
|
logicalIndex |
is the index of a character in the text.
|
pErrorCode |
must be a valid pointer to an error code value, which must not indicate a failure before the function call.
|
U_CAPI UTextOffset U_EXPORT2 ubidi_getLogicalIndex (UBiDi * pBiDi, UTextOffset visualIndex, UErrorCode * pErrorCode) |
Get the logical text position from a visual position.
If such a mapping is used many times on the same UBiDi
object, then calling ubidi_getVisualMap()
is more efficient.
This is the inverse function to ubidi_getVisualIndex()
.
pBiDi |
is the paragraph or line UBiDi object.
|
visualIndex |
is the visual position of a character.
|
pErrorCode |
must be a valid pointer to an error code value, which must not indicate a failure before the function call.
|
U_CAPI void U_EXPORT2 ubidi_getLogicalMap (UBiDi * pBiDi, UTextOffset * indexMap, UErrorCode * pErrorCode) |
Get a logical-to-visual index map (array) for the characters in the UBiDi (paragraph or line) object.
pBiDi |
is the paragraph or line UBiDi object.
|
indexMap |
is a pointer to an array of ubidi_getLength() indexes which will reflect the reordering of the characters. The array does not need to be initialized. |
The index map will result in indexMap[logicalIndex]==visualIndex
.
pErrorCode |
must be a valid pointer to an error code value, which must not indicate a failure before the function call.
|
U_CAPI void U_EXPORT2 ubidi_getVisualMap (UBiDi * pBiDi, UTextOffset * indexMap, UErrorCode * pErrorCode) |
Get a visual-to-logical index map (array) for the characters in the UBiDi (paragraph or line) object.
pBiDi |
is the paragraph or line UBiDi object.
|
indexMap |
is a pointer to an array of ubidi_getLength() indexes which will reflect the reordering of the characters. The array does not need to be initialized. |
The index map will result in indexMap[visualIndex]==logicalIndex
.
pErrorCode |
must be a valid pointer to an error code value, which must not indicate a failure before the function call.
|
U_CAPI void U_EXPORT2 ubidi_reorderLogical (const UBiDiLevel * levels, UTextOffset length, UTextOffset * indexMap) |
This is a convenience function that does not use a UBiDi object.
It is intended to be used for when an application has determined the levels of objects (character sequences) and just needs to have them reordered (L2). This is equivalent to using ubidi_getLogicalMap
on a UBiDi
object.
levels |
is an array with length levels that have been determined by the application.
|
length |
is the number of levels in the array, or, semantically, the number of objects to be reordered. It must be length>0 .
|
indexMap |
is a pointer to an array of length indexes which will reflect the reordering of the characters. The array does not need to be initialized. |
The index map will result in indexMap[logicalIndex]==visualIndex
.
U_CAPI void U_EXPORT2 ubidi_reorderVisual (const UBiDiLevel * levels, UTextOffset length, UTextOffset * indexMap) |
This is a convenience function that does not use a UBiDi object.
It is intended to be used for when an application has determined the levels of objects (character sequences) and just needs to have them reordered (L2). This is equivalent to using ubidi_getVisualMap
on a UBiDi
object.
levels |
is an array with length levels that have been determined by the application.
|
length |
is the number of levels in the array, or, semantically, the number of objects to be reordered. It must be length>0 .
|
indexMap |
is a pointer to an array of length indexes which will reflect the reordering of the characters. The array does not need to be initialized. |
The index map will result in indexMap[visualIndex]==logicalIndex
.
U_CAPI void U_EXPORT2 ubidi_invertMap (const UTextOffset * srcMap, UTextOffset * destMap, UTextOffset length) |
Invert an index map.
The one-to-one index mapping of the first map is inverted and written to the second one.
srcMap |
is an array with length indexes which define the original mapping.
|
destMap |
is an array with length indexes which will be filled with the inverse mapping.
|
length | is the length of each array. |
DOCXX_TAG typedef uint8_t UBiDiLevel |
UBiDiLevel is the type of the level values in this BiDi implementation.
It holds an embedding level and indicates the visual direction by its bit0 (even/odd value).
It can also hold non-level values for the paraLevel
and embeddingLevels
arguments of ubidi_setPara()
; there:
embeddingLevels[]
value indicates whether the using application is specifying the level of a character to override whatever the BiDi implementation would resolve it to. paraLevel
can be set to the pesudo-level values UBIDI_DEFAULT_LTR
and UBIDI_DEFAULT_RTL
.
The related constants are not real, valid level values. UBIDI_DEFAULT_XXX
can be used to specify a default for the paragraph level for when the ubidi_setPara()
function shall determine it but there is no strongly typed character in the input.
Note that the value for UBIDI_DEFAULT_LTR
is even and the one for UBIDI_DEFAULT_RTL
is odd, just like with normal LTR and RTL level values - these special values are designed that way. Also, the implementation assumes that UBIDI_MAX_EXPLICIT_LEVEL is odd.
struct UBiDi |
Forward declaration of the UBiDi
structure for the declaration of the API functions.
Its fields are implementation-specific.
This structure holds information about a paragraph of text with BiDi-algorithm-related details, or about one line of such a paragraph.
Reordering can be done on a line, or on a paragraph which is then interpreted as one single line.