Document Filters and Formatting


A virtual document represents the document text to be indexed and viewed by the application. The definition of a virtual document includes:

The virtual document definition is provided in the style.dft file. By default, the document layout consists of the entire document contents (the whole document file), beginning at row 1, column 1. The default filter specification identifies all Verity filters to be used, so all document formats can be indexed without configuration.

Universal Filter and the Helper Filters

The Verity universal filter is a document filter that produces indexable (or viewable) text like any other filter. The difference is that it dynamically filters documents according to the type of those documents using a number of "helper" subfilters. For example, Microsoft Word documents are filtered with a certain set of filters (using the KeyView Filter Kit), HTML documents are filtered in a different way with a different set of filters (the current zone filter), and PDF documents are filtered using the special PDF filter.

The advantage of the universal filter is that it removes the need to specify the document type and character set of documents before creating the collection, and it allows multiple document types written in multiple character sets to be indexed into the same collection.

The universal filter is configurable, as described in Chapter 6, "Document Filters and Formatting."

Using Your Own Filter

The Verity engine simply parses the document using the universal filter. If you develop your own filter, you can launch another filter through the use of a system call option available in the style.dft file. Using this option, the output of the system call is treated as the virtual document which is passed to the indexing engine.





Copyright © 2002, Verity, Inc. All rights reserved.