Using the style.ufl File


Using the style.ufl file, you can select to index collection field values for certain field types so that information agents can perform field searches more quickly. Field indexes are distinct from word indexes, and they are entirely optional.

Two types of indexed fields can be implemented: indexed and minmax. These field types are discussed below. Indexing field values increases the time it takes to index documents in general, but the payoff is in improved search speed.

Default style.ufl File

The default style.ufl file appears in the k2/common/vdkstyle directory along with all the other default style files, and its syntax is shown below. The comments inserted within this file give detailed explanations of the options and the default behaviors.


# style.ufl - Application-specific User Fields
#
# These fields are included in the internal documents table. For
# more information about adding fields to the internal documents
# table, see the "Field Definitions" chapter in the Collection
# Building Guide.
#
# Example:
#
# data-table: ddf
# {
# varwidth: MyTitle dxa
# }
# ----------------------------------------------------------------
# Specify additional application-specific fields here in their own
# data-table[s].

Indexed Field Type

You can opt to index fields defined in the document collection by specifying the /indexed = yes modifier for the fields to be indexed in the relevant style.ufl file.

Verity recommends that you use indexed fields as follows:

An indexed field index can be case-sensitive. If the /case-sensitive entry is specified in addition to the /indexed entry for a field keyword in a style.ufl file a case-sensitive index of field values is created.

Creating case-sensitive indexes for fields is valuable when a query contains case-sensitive search criteria. For example, if a retrieval contains a CASE modifier, a case-sensitive field index could speed the retrieval.

If case-sensitive queries are not issued, and case-sensitive field indexes are created, then maximum efficiency is not achieved, and the user will not experience improved search speed as the result of indexing fields.

Each indexed field must be in its own data table. A data table containing an indexed field must not contain any other fields. At index time, the engine issues an error message if this rule is violated:


Error E)-)448(Vdb): Indexed fields must be in their own
table (NAME)
where NAME is the name of the indexed field that is not defined in its own data table.

It is not an error to have more than one indexed varwidth field in the same data segment.You can define two indexed varwidth fields in two different data tables in the same data segment. This configuration can affect search and results display as described in Chapter 4, "Field Definitions."

Minmax Field Type

Defining a field as a minmax field is recommended for fixed-width and variable-width fields in the collection of documents when you anticipate queries will involve field searches over these fields. minmax fields greatly improve retrieval speed for field searches, especially when your document collection is very large. Only fixed-width and variable-width fields may be defined as minmax fields.

Defining minmax field indexes is very worthwhile if there are a specific range to the field's values, and if users want to perform field searches over the field regularly.

You can implement a fixed- or variable-width field as a minmax field in the style.ufl files to be used to build the collections. To define one of these types of collection fields as a minmax field, enter the minmax=yes modifier to the appropriate keyword, either fixwidth or varwidth, as shown in the example below.


varwidth: author dd5
/minmax = yes
A minmax field can store a maximum of 256 bytes. If the field value is longer, the Verity engine stores the first 256 characters in the minmax field index.

When minmax fields are defined, the Verity engine creates and maintains worm (write once, read many) field indexes containing all field values for all documents in a collection. During search processing, the search engine reads information in the field indexes, instead of reading through the actual document records.

If the information provided in the minmax worm fields is not sufficient for the search engine either to select all of the documents in the collection or to pass over all the documents, then the search engine must read the individual document records contained in the collection.





Copyright © 2002, Verity, Inc. All rights reserved.