Content-Based Retrieval Concepts

There are administrative interfaces for Content-Based Retrieval (CBR) that control domain and server-specific Verity configurations, establish and configure index areas, and initiate and manage index jobs.

Domain and Server Information

The VerityDomainConfiguration class contains properties used by the Verity installation for a particular domain. These properties are common to all Verity servers in a domain, and therefore must be the same across the entire domain. Subsequently, there can be only one VerityDomainConfiguration instance per domain.

Although a number of properties on VerityDomainConfiguration can be set, for full-text indexing to be operational in a development environment, you need set only VerityAdminMasterServerHostname and VerityAdminMasterServerPort. The UserName, UserPassword, UserDomain, and UserGroup properties must all be set for security to be enabled.

The VerityServerConfiguration class contains the properties used by the Verity installation for a particular server. These properties can differ from one server to the next. A VerityServerConfiguration object is contained in the SubsystemsConfiguration property of the Domain, Site, VirtualServer, and ServerInstance classes. The VerityServerConfiguration instance to be used is determined by these classes in this order: ServerInstance, VirtualServer, Site, and Domain.

Index Areas and Collections

An index area is a file system directory containing the information necessary to perform full-text indexing. This information is updated and queried by the CBR engine (Verity). There is a many-to-one relationship between an index area and an object store: any index area is dedicated to a single object store. You can have multiple index areas for an object store on a single file system, or distribute the indexing information for an object store in multiple index areas across file systems.

The following is a sample configuration showing the relationships of ObjectStore, VerityIndexArea, and VerityCollection objects:

The IndexArea base class contains the fundamental configuration properties for the index area, and is subclassed to provide Verity-specific properties in the VerityIndexArea class. The file system location of a Verity index area is stored in the VerityIndexArea.RootDirectoryPath property.

Each Verity index area holds multiple Verity collections (VerityCollection objects). A Verity collection references the full-text indexing information specific to a base class and any of its subclasses. There is a many-to-one relationship between Verity collections and a Verity index area.

The full-text indexing information referenced by a VerityCollection object is stored in a corresponding collection maintained by the Verity software. The name of this collection is identified in the VerityCollection.CollectionName property.

VerityCollection objects are created automatically in the associated index area, as needed. When an indexable class is instantiated, the VerityCollection object associated with its base class is used to reference the full-text indexing information. If there is no existing VerityCollection object associated with the base class, a new VerityCollection object (as well as the corresponding collection maintained by the Verity software) is created.

The IndexArea and VerityCollection classes have a ResourceStatus property, which can have a value of closed, full, open, or standby.

For VerityIndexArea objects, the ResourceStatus property is automatically set to "full" when the number of associated collections reaches the value of VerityIndexArea.MaxCollections. In this case, no more Verity collections can be created in this index area. If a replacement Verity index area having a status of "standby" exists, it is automatically set to "open". Verity collections can only be created in an index area that is "open". If there is no VerityIndexArea instance in the object store having a status of "open", no new Verity collections will be created in the object store.

Note: Verity collections existing in any index area in the object store will be searched during full-text queries, regardless of the status of the index area (IndexArea.ResourceStatus) or the status of the collection (VerityCollection.ResourceStatus). Only by deleting the VerityCollection instance can you prevent a collection from being included in full-text queries. Deletion of VerityCollection data is done automatically when an index job either disables full-text indexing, or rebuilds an index (thereby deleting and recreating the VerityCollection instance).

For VerityCollection objects, the ResourceStatus property is automatically set to "full" when the number of rows in the collection reaches the value of VerityServerConfiguration.MaxRowsPerCollection. In this case, no more data can be written to this collection. If another Verity collection having a status of "open" exists, data will automatically be written to one of the "open" collections. If no collection is "open", a collection having a status of "standby" is selected and is automatically set to "open". Data can be written only to Verity collections having a status of "open". VerityCollection instances will continue to be created automatically (as needed) in the index area until the limit specified in VerityIndexArea.MaxCollections is reached.

Although, you can set the status of a Verity collection to "closed", this is useful primarily for troubleshooting purposes. A collection will be searched during full-text queries regardless of its ResourceStatus setting.

Indexing and Index Jobs

Indexing aggregates data (in the form of indexes) to support full-text searches. Only objects and string properties enabled for CBR are included in full-text searches. This is determined by the IsCBREnabled property on the associated ClassDefinition and PropertyDefinitionString classes. See Content Searches for more information.

The indexable classes are the Document, Annotation, Folder, or CustomObject base classes, or any subclasses of these classes.

Indexing is done automatically for all CBR-enabled objects and properties, and is a batched, asynchronous operation, so the result of an indexing operation will not be evident immediately. The IndexJob class enables you to track the status of an index job, as well as initiate one. Generally, you would initiate an index job only to rebuild an index that is corrupted, or to accomodate a configuration change.

During an indexing operation, all currently indexed data is still available for use in full-text searches. However, new index data may become available while a full-text query is in progress. In this case, the query will return duplicate matches, because it will use both old and new indexed data. Old copies of indexed data are removed when the index job has completed, and duplicate matches then will no longer occur.

The IndexJobItem base class is subclassed to provide for particular types of index jobs: classes (IndexJobClassItem), Verity collections (IndexJobCollectionItem), and single objects (IndexJobSingleItem).

Class (IndexJobClassItem) and collection (IndexJobCollectionItem) index jobs require a table scan on the database, even if the amount of data to be indexed is minimal. A significant amount of time is required to perform a table scan on a large table. The database table scans are performed once for all classes to be indexed, and once for all collections to be indexed. To minimize the number of table scans required, use a single index job operation for all classes or collections to be indexed for the same table.