Documentation
(C) IBM Corp. 1996, 1999

Text Extender: Administration and Programming


Linguistic processing for retrieval

Query processing aims at making search terms weaker so that the recall rate of searches is increased, that is, more relevant documents are found. There are two basic operations on query terms to achieve that goal; they are expansions and reductions. In addition, some search term operations involve both expansion and reduction.

Synonyms

Synonyms are semantically related words. Usually, these words have the same word class or classes (such as noun, verb, and so on) as the source term. Synonyms are obtained from a separate file for each language. They are always returned in base form and, up to a few exceptions, are not multi-word terms. Search term words are always reduced to their base form when looking up synonyms. Here are some examples of a word's synonyms in three languages:

Thesaurus expansion

A search term can be expanded using thesaurus terms that can be reached through a specific relation. These relations may be hierarchical (such as the "Narrower term" relation), associative (such as a "Related term" relationship), or it may be a synonym relationship. A thesaurus term may be, and often is, a multi-word term.

Thesaurus concepts describes thesaurus expansion in more detail.

Sound expansion

Sound expansion expands single words through a set of similarly sounding words. It is particularly useful whenever the exact spelling of a term to be searched is not known.

Character and word masking

Masking is a non-linguistic expansion technique, where a regular expression is replaced with the disjunction of all indexed words that satisfy it. Neither a masked expression nor any of its expansions is subject to lemmatization, stop-word extraction, or any of the other expansion techniques. This may have the effect that, for example, an irregular verb form like swum, when searched with the masked term swu*, is matched on a precise index, but not on a linguistic index, where this form has been lemmatized to become swim.

If you use word masking, performance can be slow, especially when searching in large indexes.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]