Documentation
(C) IBM Corp. 1996, 1999

Text Extender: Administration and Programming


Linguistic processing for browsing

Linguistic processing is also used when you browse documents that have been found after a search. It is done in two stages:

  1. Basic text analysis: normalization and term expansion

  2. Extended matching.

Stage 1: Normalization and term expansion

The first stage is done without using an electronic dictionary.

Normalization

Normalization is described in Basic text analysis.

Term expansion

Term expansion is the inverse of reducing a term to its base form. If the index is linguistic, or if the index is dual and the search argument contains the option for linguistic processing STEMMED FORM OF, then search terms are reduced to their base form before the search begins.

Similarly, if you have a linguistic or a dual index, a document's terms are reduced to their base form before being added to the index. Documents are therefore found on the basis of a term's base form.

When you browse a found document, however, you expect to see all variants of the base form highlighted. To highlight these variants, the found base term is expanded.

All variants (inflections) for each term found in the dictionaries can be produced. These are the inflections produced for the German word gehen (to go):

gegangen  geh    gehe    gehen    gehend  gehest  gehet  gehst
ging      ginge  gingen  gingest  ginget  gingst  gingt  geht

Stage 2: Extended matching

The second stage is extended matching, which can be used on the rare occasions when basic text analysis and normalization cannot highlight a found term. Extended matching finds the more obscure matches.

You choose extended matching by specifying DES_EXTENDED as a parameter in the DesOpenDocument API function.

Extended matching uses the same linguistic processing that is done while linguistically indexing.

These are the occasions when extended matching can find additional matches:


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]