Verity collection partitions

Verity (also known as Autonomy) collections can be partitioned by date to address high-volume indexing of date-sensitive documents, such as e-mail messages archived in a Content Engine repository. For documents in which a date-based property is a primary identifier, collection partitioning reduces the number of collections that need to be searched, thus reducing query retrieval time.

How partitioning works

An object store contains two partition-related properties: Date Property and Interval. For Date Property, you set the name of a partitioning property that takes a date/time value and that is contained by one or more classes defined in the object store. For example, if you want to partition documents of an e-mail class based on the receivedDate property of the class, you set Date Property to receivedDate. For document classes that have the partitioning property, you must specify that the value of the property is set only upon document creation. (If partitioning property values are changed after document creation, search results are invalid.)

For Interval, you set the length of time in months that you want each of the Verity collections in the object store to span. When the collections are created within an index area, each collection is automatically associated with a date range, defined by the Start Date and End Date index area properties. See Object store (CBR tab) for more information about setting the Date Property and Interval properties. See Index area properties (Index area collections tab) for information about viewing the Start Date and End Date for a collection.

When a new document containing a partitioning property is indexed, the value of the property is compared to the collections of an index area. If the value is within the date range of a collection and if the collection is not full, the document is indexed into that collection. If the partitioning property value of the document does not fit within the date ranges of the existing collections, a new collection is created, with its Start Date and End Date property values determined by the object store Interval property and the partitioning date value of the document. The document is then indexed into the new collection.

The partitioning intervals that you set on an object store Interval property indicate the length of time in months that each Verity collection spans. Possible values are 1, 2, 3, 4, 6, or 12, with the default value of null, or a value of 0, indicating that partitioning is disabled. Intervals span integral numbers of months, without spanning year-end boundaries. A partitioned collection contains documents with partitioning dates equal to or greater than the Start Date of the collection and less than the End Date of the collection. For example, if the date associated with a document is December 15th, 2008, and the partition interval is three months, the Start Date and End Date properties would be set to October 1, 2008 and January 1, 2009, with dates expressed in UTC format.

If a document does not have a partitioning property, or if the value of the property is null, the document is indexed into any non-full collection that has Start Date and End Date properties both set to null.

If the maximum collections setting of an index area has been reached so that no new collection can be created, IBM® Legacy Content Search Engine checks for open index areas to index documents.

Reindexing requirement

If you upgrade from a previous Content Engine version that did not support Verity collections partitions, all existing object stores are upgraded with the Date Property and Interval properties, and all existing Verity collections are upgraded with the Start Date and End Date properties. After the upgrade, the partitioning feature is initially disabled, with the values of the above-mentioned properties set to null.

To enable partitioning on an existing object store, set the Date Property and Interval properties of the object store. As you index new objects that contain the partitioning property, new Verity collections are created with date ranges; that is, their Start Date and End Date properties are set to non-null values.

Note, however, that Verity collections that existed before the upgrade to the FileNet P8 Platform 4.5.1 release remain unpartitioned; that is their Start Date and End Date properties are set to null. If the unpartitioned collections contain documents of classes that have the partitioning property, queries based on the partitioning property cannot retrieve documents from the unpartitioned collections. Therefore, for queries to perform correctly, you must reindex all existing Verity collections after enabling the partitioning feature. To reindex collections, see Collection indexing.

Content-based retrieval (CBR) queries

To narrow searches to the partitioned collections, you must specify the partitioning property in your queries; otherwise, all collections are searched. For more information, see CBR queries using Enterprise Manager.