Identifying the text language

Identify the text language for an object store to ensure accurate text processing and optimal indexing performance. For example, you can identify the text language as French. If the text language is not the same for the objects that belong to CBR-enabled classes, however, set the identification of the text language to automatic. Automatic language identification occurs on an object-by-object basis. To index objects, you must set text language identification to automatic if you do not identify the text language for the object store.

Automatic language identification is not always accurate. An IBM® Content Search Services index server can misidentify the text language for an object especially when the object has few words. An object with a misidentified text language is findable by exact match searches but unfindable by language-aware searches. In particular, a word stem search that a search server performs automatically for a search term cannot find an object with a misidentified text language. For example, if you search for lions, objects with the stem lion are not found if the language for the object has been misidentified.

To identify the text language or to set text language identification to automatic for an object store:

  1. Working with object store properties. Access the CBR-related properties for the object store.
  2. Select either the code for the text language or AUTO for automatic language identification in Indexing Language. For information about the valid language codes, see "Indexing Language" in Object store properties (CBR IBM Search tab).
  3. Click OK.

Related concepts
For information about word stem searching, see Word stems.