The Thesaurus Control File


A thesaurus control file with a .ctl extension is required as input to build a thesaurus file using the mksyd command-line tool. The output of a successful mksyd command is a compiled thesaurus file with the .syd extension.

The existing English thesaurus control file contains synonym list definitions in list keyword statements. Each synonym list: definition contains a list of synonyms and/or keys. If no keys are specified for a given list:, every synonym in the list is a key, and the list is circular. This means that the list is found whenever the thesaurus is queried for any word in the list. If keys are given and some words in the list are not keys, the list is asymmetric and is only found when the thesaurus is queried for the given keys.

If you want to create a thesaurus file in another language, you can either create the control file using an ASCII editor or purchase a thesaurus in the language of choice and add the statements in the following section to the control file.

Sample Thesaurus Control File

The following is a sample of the structure of a thesaurus control file.


$control:1
synonyms:
{
list: "abort,miscarry,terminate,halt,end,fail"
list: "cease,stop,desist,terminate,end,discontinue"
list: "karma <or> fate <or> destiny"
/keys = "karma"
}
$$
The first two lists: are circular, while the last is asymmetric. Note that the words "terminate" and "end" are keys in two lists. In this situation, a thesaurus query on either "terminate" or "end" produces an expanded query containing both lists:


"(cease,stop,desist,terminate,end,discontinue) <or>
(abort,miscarry,terminate,halt,end,fail)"

The synonyms Keyword

The synonyms keyword is required in a thesaurus control file. It must appear directly after the $control:1 directive.

The list Keyword

The list: keyword specifies the synonyms in a list, either in query form or in a list of words or phrases separated by commas. The optional modifier /keys specifies the keys list, which must be a list of words separated by commas. If /keys is absent, all synonyms in the list become keys. The optional modifier /op-default defines the fallback operator to use if there is no match for a thesaurus query.

The maximum length for lists is 32,000 characters.

NOTE: If you separate your list into multiple lines (inserting new lines), you must include a backslash (\) at the end of each line so that the lines are treated as one list.

The following is a sample list keyword statement:


list:"happy, joyous, joyful, glad, blithe, merry,\
cheerful, contented, blissful, delighted, satisfied,\
pleased, favored, lucky, fortunate, propitious,\
appropriate, felicitous, befitting"

The qparser Keyword

The qparser keyword defines the query parser expansion operators for expanded thesaurus queries. The default values are described below.

Type of Expansion
Operator Used for Expansion
leaf operator
<STEM>
combinational operator
<ANY>
phrase operator
<PHRASE>
A comma is the default combinational operator in Verity applications. Commas are interpreted differently in default thesaurus query expansion, compared to default simple query expansion. Commas are interpreted as the <Any> operator instead of the <Accrue> operator.

To use the same default operators in thesaurus query expansion as in simple query expansion, specify simple as the argument to qparser, as in:

qparser: simple





Copyright © 2002, Verity, Inc. All rights reserved.