Collection Features


This section introduces basic information about how to build and configure Verity collections for your search application.

What Is a Collection?

A collection represents "metadata" or data about a set of documents with an optimized collection architecture for searching. Document metadata include indexes optimized for searching, document locations, required fields related to documents and, optionally, custom fields related to documents.

Collection architecture supports persistent searching using multiple search processes and/or clients. An individual collection is stored in a directory that is portable across multiple platforms.

Features of the Collection Architecture

Collection architecture supports the following features:

Contents of Collection Indexes

An individual collection represents a logical group of documents plus a set of metadata about those documents. The specific information stored for a collection includes various word indexes, an internal documents table containing document field information and logical pointers to the actual document files.

Any Verity collection-building application can read and add documents to any collection to which it has valid access. Concurrent access to collections by multiple Verity sessions is enabled through file sharing, and is synchronized through file locking and collection servicing.

Dynamic Access to Documents

A typical Verity application controls numerous collections, with the number of documents associated with a collection optimized for searching. Collections have many features that ensure consistent and continual dynamic access to documents.

General housekeeping is a service that cleans up the collection files, deletes older disk files that are no longer needed, monitors the collection for problems, and prevents the optional system log from exceeding a certain size. Other background services preserve the integrity of indexed documents while making those documents accessible for searching at all times.

The Verity engine controls an application's access to collections by maintaining collection metadata on an ongoing basis. When a collection-building application submits documents to be indexed, the Verity engine processes the index request and updates the collection by creating a new set of metadata in order to not disrupt searches. When the update is complete, the new set of metadata is used, and the old set is cleaned up by the general housekeeping and background servicing functions.

Universal Document Support

Any repository of documents, containing documents in a variety of formats, can be made searchable by building collections.

Verity applications allow users to search, navigate, and organize information for any supported document format. Document filters support indexing and viewing. Many filter types are available: KeyView filters for numerous native formats, zone filter for HTML and tagged ASCII formats, and the PDF filter for Adobe Acrobat PDF. For documents in native formats such as Microsoft Word and Lotus 1-2-3, filters generate WYSIWYG document representations for viewing.

A gateway is a set of access methods for accessing a repository of documents. Verity collection-building applications provide a default gateway for accessing document text and related field information.

Optional Indexes

Optional indexes based on collections offer advanced search features like Category Search, Parametric Search, Adaptive Ranking, and topic search. The indexes for topic searches are called topic sets. For information about building topic sets, see the Verity K2 Intelligent Classification Guide.

For access to many other advanced features, you can license and use Verity K2 Developer. For more information about these features, see the Verity K2 Developer Getting Started Guide.





Copyright © 2002, Verity, Inc. All rights reserved.