Inside a Verity Collection
A collection consists of word indexes, documents tables, and optional indexes used for specialized functions. Each collection is subdivided into units called partitions, and for each partition there is a word index and documents table.
Types of Indexes
A Verity collection has a word index for each partition plus one or more optional indexes. These are the basic index types:
Documents Table
The fields of a document table are persistent or transitory. All fields are persistent with the exception of the Score field which is transitory.
Persistent Fields
Persistent fields are persistent between sessions, and they can be internal or external. Internal fields are stored in indexed document tables internally managed by the Verity engine. External fields are stored in indexed document tables in external repositories, such as applications, relational databases, and so on. Examples of persistent fields are title and author, information that might perhaps not necessarily be part of the text of all documents.
Internal fields are defined in the document descriptor file (style.ddd
). External fields are defined by some external application and can be addressed using the Verity gateway suites, described in the Verity gateway documentation.
The VdkVgwKey is a special Verity field that is used as the persistent document handle. By default, the Verity engine assigns the document file name to VdkVgwKey.
Transitory Fields
Transitory fields are stored during a session, and go away when the session is over. As a logical construct, the transitory field is added to the table format. An example of a transitory field is a document's score. Other transitory fields can be defined using the Verity Developer Kit, as described in that product's documentation.
What Are Partitions?
When indexing documents, the Verity engine stores document metadata in units called partitions. Each partition contains metadata (typically a full-word index) for a set of documents consisting of anywhere from 1 to 64K documents. The Verity engine does not actually copy your document; rather it stores a partition contains all of the metadata associated with the documents that make them searchable, including:
Partitions have a scalable architecture that supports incremental searching over changing collections first, and only then provide the results. By subdividing collections into partitions, the Verity engine can incrementally search the collection a partition at a time, and provide search results after each, rather than having to search the entire collection first, and only then provide the results. Given this scenario, search performance should be uniform regardless of whether a collection is 1 megabyte or 1 gigabyte initial size.
Copyright © 2002, Verity, Inc. All rights
reserved.