Documentation
(C) IBM Corp. 1996, 1999

Text Extender: Administration and Programming

Search argument

Search argument syntax

>>-+---------------------------------------------------+-------->
   '-THESAURUS--"thesaurus-name"--+-----------------+--'
                                  '-COUNT--"depth"--'
 
>-----+-----------------------+--------------------------------->
      '-RESULT LIMIT--number--'
 
>-----+-| boolean-argument |--+---------------------------+---+-><
      |                       '-&--| freetext-argument |--'   |
      '-+--------------------------+---| freetext-argument |--'
        '-| boolean-argument |--&--'
 
boolean-argument
 
    .-& or |--------------------------------------------------------------.
    V                                                                     |
|-----+-| search-factor |----------------------------------------------+--+->
      |                       .---------------------------------.      |
      |                       V                                 |      |
      '-(--| search-factor |------+- & -+---| search-factor |---+---)--'
                                  '- | -'
 
>---------------------------------------------------------------|
 
freetext-argument
 
|---IS ABOUT----+-----------------+---+-----------+------------->
                +-SYNONYM FORM OF-+   '-language--'
                +-feature---------+
                '-| thesaurus |---'
 
>----"phrase-or-sentence"----+------------------------+---------|
                             '-ESCAPE--"escape-char"--'
 
search-factor
 
|---+---------------------------------------------------------------------+->
    |                                           .-,---------------.       |
    |                                           V                 |       |
    '-+--------------------+---+-SECTION--+--(-----section-name---+---)---'
      '-MODEL--model-name--'   '-SECTIONS-'
 
>---| search-element |------------------------------------------|
 
search-element
 
|---+-+-----+--| search-primary |--------------------------------------+->
    | '-NOT-'                                                          |
    |                                            .-AND---------------. |
    |                                            V                   | |
    '-| s.-primary |--+-IN SAME PARAGRAPH AS-+------| s.-primary |---+-'
                      '-IN SAME SENTENCE  AS-'
 
>---------------------------------------------------------------|
 
search-primary
 
|---+-| search-atom |----------------+--------------------------|
    |    .-,------------------.      |
    |    V                    |      |
    '-(-----| search-atom |---+---)--'
 
search-atom
 
|---+-----------------------------+---+-----------+------------->
    +-PRECISE FORM OF-------------+   '-language--'
    +-STEMMED FORM OF-------------+
    +-FUZZY FORM OF--match-level--+
    +-SYNONYM FORM OF-------------+
    +-BOUND-----------------------+
    +-SOUNDS LIKE-----------------+
    +-feature---------------------+
    '-| thesaurus |---------------'
 
>----"word-or-phrase"----+-----------------------------+--------|
                         '-ESCAPE--"escape-character"--'
 
thesaurus (if THESAURUS is specified)
 
|---+---------------------+---TERM OF---------------------------|
    '-EXPAND--"relation"--'
 

Examples

Examples are given in Specifying search arguments.

Search parameters

IS ABOUT

An option that lets you specify a free-text search argument, that is, a natural-language phrase or sentence that describes the concept to be found. See Free-text and hybrid search.

MODEL model-name
A keyword used to specify the name of the document model to be used in the search term. The document model describes the structure of documents that contain identifyable sections so that the content of these sections can be searched individually.

The model name must be specified in a document model file described in Working with structured documents. The model name can be masked using wildcard characters.

If you do not specify a model, the default model specified during index creation is used.

SECTION(S) section-name

A keyword used to specify one or more sections that the search is to be restricted to. The section name must be specified in a model in a document model file, described in Working with structured documents. A section name can be masked using wildcard characters % and _.

Sections can be nested within other sections, for example:

play/Act/Title=play/act/title

Restrictions: Searching in nested sections is possible only for documents stored in columns enabled with format XML. For Ngram indexes, only one section name can be searched and XML format is not supported. .

THESAURUS thesaurus-name

A keyword used to specify the name of the thesaurus to be used to expand the search term. The thesaurus name is the file name (without its extension) of a thesaurus that has been compiled using the thesaurus compiler TXTHESC or TXTHESN. There are default thesauri desthes and desnthes, stored in the sample directory, where desnthes is an Ngram thesaurus. You can also specify the file's path name. The default path name is the dictionary path.

COUNT depth

A keyword used to specify the number of levels (the depth) of terms in the thesaurus that are to be used to expand the search term for the given relation. If you do not specify this keyword, a count of 1 is assumed.

RESULT LIMIT number

A keyword used to specify the maximum number of entries to be returned in the result list. number is a value from 1 to 32767. If a free-text search is used, the search result list is ranked only with respect to the complete search result list. Otherwise, the limited search result is ranked only from the entries of the list.

EXPAND relation

A keyword used to specify the relation, such as INSTANCE, between the search term specified in TERM OF and the thesaurus terms to be used to expand the search term. The relation name must correspond to a relation used in the thesaurus. See Thesaurus concepts.

For an Ngram thesaurus, use the member-relation name described in Creating an Ngram thesaurus. For user-defined member relations, use :RELATION n where n is the member relation number specified in :RELATED (number)

TERM OF "word-or-phrase"

The search term, or multi-word search term, to which other search terms are to be added from the thesaurus.

search-factor
An operand that can be combined with other operands to form a search argument. The evaluation order is from left to right.

The logical AND (&) operator binds stronger than the logical OR (|) operator. Example:

     "passenger" & "vehicle" | "transport" & "public"

is evaluated as:

     ("passenger" & "vehicle") | ("transport" & "public")

To search for:

     "passenger" & ("vehicle" | "transport") & "public"

you must include the parentheses as shown.

NOT search-primary

An operator that lets you exclude text documents from your search that contain a particular term.

When NOT is used in a search factor, you cannot use the SYNONYM FORM OF keyword.

search-primary IN SAME PARAGRAPH AS search-primary

A keyword that lets you search for a combination of terms occurring in the same paragraph.

The following search argument finds text documents containing the term "traffic" only if the term "air" is in the same paragraph.

     "traffic" IN SAME PARAGRAPH AS "air"

You cannot use the IN SAME PARAGRAPH AS keyword when NOT is used in a search factor.

search-primary IN SAME SENTENCE AS search-primary

A keyword that lets you search for a combination of terms occurring in the same sentence. Similar to IN SAME PARAGRAPH AS.

AND search-primary

A keyword that lets you combine several search-primaries to be searched for in the same sentence or the same paragraph.

The following search argument searches for "forest", "rain", "erosion", and "land" in the same sentence.

     "forest" IN SAME SENTENCE AS "rain" AND "erosion" AND "land"

search-atom
If you connect a series of search atoms by commas, then a search is successful if a term in any one of the search atoms is found. Each search atom must contain at least a word or a phrase.

The following statement is true if one or more of the search arguments is found.

     CONTAINS (mytexthandle, '( "text",
                                "graphic",
                                "audio",
                                "video")') = 1

PRECISE FORM OF, STEMMED FORM OF, FUZZY FORM OF, SYNONYM FORM OF, BOUND

Table 5 shows the options that correspond to the various types of index. For example, for a linguistic index, any of the options are suitable except for PRECISE FORM OF. If you specify PRECISE FORM OF, it is ignored and the default value is taken.

The search term processing is described in more detail in Table 6.

Table 5. Linguistic options
Search atom keyword Index type
Linguistic Precise Precise Normalized Dual Ngram Ngram case- enabled
PRECISE FORM OF
X X X
O
STEMMED FORM OF X

O O O
FUZZY FORM OF



O O
IS ABOUT O O O O

SYNONYM FORM OF O O O O

EXPAND O O O O

SOUNDS LIKE O O O O

IN SAME SENTENCE AS O O O O O O
IN SAME PARAGRAPH AS O O O O O O
BOUND



O O
X=default setting O=function available


Table 6. Search term options for Ngram indexes
Search atom keyword Search term processing
Case Stemming Match
Sensitive Insensitive Exact Fuzzy
PRECISE FORM OF when case-enabled X
X
STEMMED FORM OF
X X

FUZZY FORM OF
X

X
X=default setting

If you use a keyword that is not available for that index type, it is ignored and either the default keyword is used instead, or a message is returned.

PRECISE FORM OF
A keyword that causes the word (or each word in the phrase) following PRECISE FORM OF to be searched for exactly as typed, rather than being first reduced to its stem form. For precise and dual indexes, this form of search is case-sensitive; that is, the use of upper- and lowercase letters is significant. For example, if you search for mouse you do not find "Mouse".

This is the default option for precise and dual indexes. For a precise normalized index, the default form of search is not case-sensitive. If you specify this keyword for a linguistic index, it is ignored and STEMMED FORM OF is assumed.

STEMMED FORM OF
A keyword that causes the word (or each word in the phrase) following STEMMED FORM OF to be reduced to its word stem before the search is carried out. This form of search is not case-sensitive. For example, if you search for mouse you find "Mouse".

The way in which words are reduced to their stem form is language-dependent.

Example: programming computer systems is replaced by program compute system when you use the US-English dictionary, and by programme compute system when you use the UK-English dictionary.

This search phrase can find "programmer computes system", "program computing systems", "programming computer system", and so on.

This is the default option for linguistic indexes. If you specify this keyword for a precise index, it is ignored and PRECISE FORM OF is assumed instead.

FUZZY FORM OF
A keyword for making a "fuzzy" search, which is a search for terms that have a similar spelling to the search term. This is particularly useful when searching in documents that were created by an Optical Character Recognition (OCR) program. Such documents often include misspelled words. For example, the word economy could be recognized by an OCR program as econony.

match-level: An integer from 1 to 5 specifying the degree of similarity, where 5 is more similar than 1.

SYNONYM FORM OF
A keyword that causes the word or phrase following SYNONYM FORM OF to be searched for together with its synonyms. The synonyms are provided by the dictionary specified by language or else by the default dictionary.

Synonyms for a phrase are alternative phrases containing all the possible combinations of synonyms that can be obtained by replacing each word of the original phrase by one of its synonyms. The word sequence remains as in the original phrase.

If you specify this keyword for a precise index, it is ignored and PRECISE FORM OF is assumed instead.

If you specify this keyword for a dual index, the search is made using the linguistic part of the dual index rather than the precise part.

You cannot specify this keyword when NOT is used in the search factor, or when the word or phrase to be searched for contains masking characters.

BOUND
A keyword for searching in documents that use the Korean CCSID. It causes the search to respect word phrase boundaries. If language is specified, it is ignored; Korean is assumed.

language
A variable that determines which dictionary is used in linguistic processing of text documents during indexing and retrieval. This applies not only to linguistic and dual indexes, but also to precise indexes because these use a dictionary to process stop words.

Linguistic processing includes synonym processing and word-stem processing. See The supported languages for more information.

The supported languages are listed in Languages.
Note:When searching in documents that are not in U.S. English, you must specify the language in the search argument regardless of the default language.

"word-or-phrase"
A word or phrase to be searched for. The characters that can be used within a word are language-dependent. It is also language-dependent whether words need to be separated by separator characters. For English and most other languages, each word in a phrase must be separated by a blank character.

Precise or linguistic search. Text Extender can search using either the precise form of the word or phrase, or a variation of it. If you do not specify one of the options in Table 5, the default linguistic options are used according to which type of index is being used.

To search for a character string that contains double quotation marks, type the double quotation marks twice. For example, to search for the text "wildcard" character, use:

"""wildcard"" character"

Masking characters. A word can contain the following masking characters:

_ (underscore)
Represents any single character.

% (percent)
Represents any number of arbitrary characters. If a word consists of a single %, then it represents an optional word of any length.

A word cannot be composed exclusively of masking characters, except when a single % is used to represent an optional word.

If you use a masking character, you cannot use SYNONYM FORM OF, feature, or THESAURUS.

ESCAPE escape-character
A character that identifies the next character as one to be searched for and not as one to be used as a masking character.

Example: If escape-character is $, then $%, $_, and $$ represent %, _, and $ respectively. Any % and _ characters not preceded by $ represent masking characters.

Summary of rules and restrictions

Boolean operations
NOT is not allowed after OR.

Dual index
Takes as default STEMMED FORM OF. If masking characters are used, searches are case-sensitive.

FUZZY FORM OF
The first 3 characters must match. Cannot be used if a word in the search atom contains a masking character. Cannot be used in combination with NOT. Can be used only with an Ngram index.

IN SAME PARAGRAPH AS
Cannot be used if NOT is used in a search factor.

IN SAME SENTENCE AS
Cannot be used if NOT is used in a search factor.

Linguistic index
Prevents the use of PRECISE FORM OF. Takes as default STEMMED FORM OF. Masking characters can be used. Searches are case-insensitive.

Masking character
Prevents the use of SYNONYM FORM OF, feature, and THESAURUS.

Ngram index
Masking characters can be used, although not following a non-alphanumeric character. Searches are case-insensitive unless the index is case-enabled and PRECISE FORM OF is used.

NOT
Prevents the use of SYNONYM FORM OF, IN SAME PARAGRAPH AS, and IN SAME SENTENCE AS.

PRECISE FORM OF
Ignored for a linguistic index.

Precise index
Prevents the use of STEMMED FORM OF, and SYNONYM FORM OF. Takes as default PRECISE FORM OF. Masking characters can be used. Searches are case-sensitive.

STEMMED FORM OF
Ignored for a precise index, but available for a normalized precise index containing English documents.

SYNONYM FORM OF
Cannot be used if a word in the search atom contains a masking character. Cannot be used in combination with NOT. Cannot be used with a precise index.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]