IBM® Content Search Services employs language-aware processing for both documents and search expressions. This processing requires that the text language be properly identified. For a search expression, language misidentification is impossible because a search expression is run for all indexed languages. For a document, language misidentification causes incomplete indexing, which potentially makes the document not findable by some types of searches.
In particular, the following types of searches might fail to find an incompletely indexed document:
Language-aware token search | A language-aware search works only for those incompletely indexed documents that happen to contain the relevant word stems. For example, a search for tested always finds a document with the word test, but it does not find an incompletely indexed document with the word testing. |
---|---|
Wildcard search | A wildcard search might fail for a document
whose actual text language is one of the following ones:
This rule does not apply to documents whose misidentified language is Asian and whose actual language is Asian. (The Asian language might be a supported or unsupported language). For example, suppose that a Japanese document is misidentified as a Chinese document: Japanese wildcard searches still find the document appropriately. |
The text language of a document might be misidentified for the following reasons:
The text language is not one of the possible languages | If the text language for a document is not one of the possible languages for the object store, the language is necessarily misidentified. For example, if you selected only English and French as the possible languages, the language of any Spanish document is misidentified. |
---|---|
The amount of text is insufficient | Typically, the less text that a document contains, the higher the risk of inaccurate language identification. |
The text language is not the default language | An analysis of the text language might fail to positively identify the language. In this case, IBM Content Search Services presumes the language to be the one that you selected as the default for the object store. For example, suppose that you selected English as the default object store language: if the language of a French document cannot be positively identified, the language is presumed to be English. |