Natural Language Operators


The natural language operators enable you to specify search criteria using natural language syntax. The search engine uses natural language analysis to translate the query text into Verity query language expression for evaluating and scoring documents. The FREETEXT and LIKE natural language operators are intended mainly for use by application developers.

FREETEXT

Interprets text using the free text query parser and scores documents using the resulting query expression. All retrieved documents are relevance-ranked. For information about the free text query parser, refer to Appendix A.

This operator provides the functionality of the free text query parser, but allows you to combine free text queries with other search criteria using the full Verity query language. For example:

<FREETEXT> ( "peace negotiations in the Middle East" ) <AND>
(DATE > 01-01-96)

The quotation marks are required. If you want to include embedded quotes, they must be preceded with backslashes, as:

<FREETEXT> ( "\"Independence Day\""), ("\"The Arrival\""), science fiction" )

NOTE: In the case where a query or document contains only words defined as stop words in the collection style.stp file(s), the free text query parser uses the stop words for the query, ignoring the stop words list.

The FREETEXT operator can be combined with other operators in the same way as the ACCRUE operator.

LIKE

Searches for other documents that are like the sample one or more documents or text passages you provide. The search engine analyzes the provided text to find the most important terms to use for the search. If multiple samples are provided, the search engine assumes all of the samples are about a single theme and selects important terms common across the samples. Retrieved documents are relevance-ranked.

The LIKE operator accepts a single operand, called the QBE (query-by-example) specification. The QBE specification can be either the literal text of the example to query on, or it can be a specification of one or more full documents and text passages to use as positive and negative examples.

NOTE: In the case where a query or document contains only words defined as stop words in the collections style.stp file(s), a QBE query with the LIKE operator returns no results.

Syntax

Document specification is made with a series of text references enclosed in braces. The syntax for specifying references is:

{[name=]type:value [name=]type:value ...}

where:

name is either posex (positive example), or negex (negative example).

A negative example reduces the weights of terms when they occur in a positive example. If terms from a negative example do not exist within the positive example, the negative example has no effect. (Hence a negex by itself makes no sense.)

type can be one of the following:

value is a reference to a piece of text to use as the positive or negative example.

If name is not specified, value is assumed to be a reference to a positive example (that is, posex is the implied name).

The value of value depends on type:

where a byte offset into the file and a byte range from that offset can be optionally specified

If there is no explicit type specifier, value is interpreted in the following ways:

The Like operator can be combined with other operators using the same rules as for the ACCRUE operator.

Special Characters in VdkVgwKey Fields

The syntax for the LIKE operator allows VdkVgwKeys to be enclosed in quotes (either single or double) to avoid parsing confusion. This means VdkVgwKeys containing things like whitespace, curly braces, and quotes can be handled. Backslash must be used to escape quote characters and backslashes embedded in the key, as is standard for string handling.

The syntax supports the use of single quotes for enclosing literal text examples, as in {text:'sample text'}.

The syntax for text: and vdkvgwkey: references has been enhanced to allow the reference value to be enclosed in either single or double quotes, with the usual backslash escaping mechanisms for embedded backslashes and quotes.

Concerning the backslash character in document keys, follow these guidelines. When a backslash appears in a document key, you must enter two backslashes in the <LIKE> syntax. See "VdkVgwKey Fields on Windows Systems" below for important information about specifying paths on Windows systems.

Syntax examples are below:


<LIKE> ( "{text:'sample text'}" )
<LIKE> ( "{text:"sample text"}" )
<LIKE> ( "{text:"sample `quote'"}" )
<LIKE> ( "{text:"sample \"quote\""}" )
<LIKE> ( "{vdkvgwkey:keyname}" )
<LIKE> ( "{vdkvgwkey:'{keyname}'}" )
<LIKE> ( "{vdkvgwkey:"{keyname}"}" )
<LIKE> ( "{vdkvgwkey:"c:\\my\\data"}" )

VdkVgwKey Fields on Windows Systems

To specify a VdkVgwKey including backslashes on Windows systems, you must double escape the two required backslashes. This means you must enter four backslashes, as shown in the example below:


<LIKE> ( "{vdkvgwkey:"c:\\\\my\\\\data"}" )

Examples of LIKE Expressions

The following examples illustrate uses of the LIKE operator.

Just literal text:

<LIKE> ("The dog ate the shoe.")

Explicit specification of a single positive example:

<LIKE> ( "{posex=vdkvgwkey:doc1}" )

Explicit specification of multiple positive and negative examples:

<LIKE> ( "{posex=vdkdocid:1234 posex=vdkvgwkey:doc1
negex=text:"stock market"}" )

Same as the preceding but with implied reference types:

<LIKE> ( "{posex=#1234 posex=doc1 negex=\"stock market\"}" )

Similar to the preceding but with implied posex names:

<LIKE> ( "{vdkdocid:1234 vdkvgwkey:doc1}" )

Same as the preceding, but using the most implicit syntax:

<LIKE> ( "{#1234 doc1}" )

You can combine a text reference list with literal text:

<LIKE> ( "{#1234 doc1} And more text" )

The preceding QBE specification is equivalent to this:

<LIKE> ( "{#1234 doc1 text: \"And more text\"}" )

The simplest way of specifying a single positive example by VgwKey:

<LIKE> ( "{doc1}" )

The example is in the file doc.txt, starting at the 100th byte:

<LIKE> ( "{posex=file:doc.txt:100:200}" )

Quotation marks embedded in LIKE expressions must be preceded by backslashes. The backslash indicates to the engine that the following character is supposed to be treated as a literal character.

Efficiency Considerations

In order to process a LIKE expression, the search engine must analyze the full text of the examples in the QBE specification. This has the potential to be time consuming, especially if the example documents are large or require expensive filtering.

The processing of LIKE queries can be accelerated by extracting feature vectors for documents at indexing time. Feature vectors are extracted during indexing when an appropriate entry is made in the style.prm file, as described in the Verity Collection Reference Guide. With feature vectors available in the collection, the search engine does not need to touch the original text of the example documents and LIKE queries are processed very efficiently.





Copyright © 2001, Verity, Inc. All rights reserved.