Linguistic processing is also used when you browse documents that have been found after a search. It is done in two stages:
The first stage is done without using an electronic dictionary.
Normalization is described in Basic text analysis.
Term expansion is the inverse of reducing a term to its base form. If the index is linguistic, or if the index is dual and the search argument contains the option for linguistic processing STEMMED FORM OF, then search terms are reduced to their base form before the search begins.
Similarly, if you have a linguistic or a dual index, a document's terms are reduced to their base form before being added to the index. Documents are therefore found on the basis of a term's base form.
When you browse a found document, however, you expect to see all variants of the base form highlighted. To highlight these variants, the found base term is expanded.
All variants (inflections) for each term found in the dictionaries can be produced. These are the inflections produced for the German word gehen (to go):
gegangen geh gehe gehen gehend gehest gehet gehst ging ginge gingen gingest ginget gingst gingt geht
The second stage is extended matching, which can be used on the rare occasions when basic text analysis and normalization cannot highlight a found term. Extended matching finds the more obscure matches.
You choose extended matching by specifying DES_EXTENDED as a parameter in the DesOpenDocument API function.
Extended matching uses the same linguistic processing that is done while linguistically indexing.
These are the occasions when extended matching can find additional matches:
Masking characters are processed and stem reduction is done for the search term and the corresponding documents are found. Without extended matching, text that matches the specified search criteria would not be highlighted.
Example: A document contains the inflected term swam.
When a document in a Germanic language contains a compound word and is indexed using a linguistic index, the document index retains the parts of the compound word and the compound word itself. When you search for a part of a compound word, the documents containing the compound word are found, but without extended matching the word is not highlighted.
Example: A document contains the German word Apfelbaum (apple tree).
If the hyphen is inserted automatically by a word processor, the hyphenated word can be found and highlighted. If, however, the hyphen is typed by the user, the documents containing the word are found, but without extended matching the word is not highlighted.
Example: A document contains the hyphenated word container, broken at the end of a line like this:
Another name for a folder is a con- tainer.