Full-text indexes

Full-text indexes contain the information used to find objects by searching for embedded words or phrases. Full-text indexes are also referred to as Autonomy (also called Verity) collections. Lists of Autonomy collections are maintained in index areas.

Autonomy collection
An Autonomy collection contains full-text indexing information and is represented by a VerityCollection object in the object store database and a set of files in a file system. These files are maintained by the Autonomy software and contain the actual full-text information. Each Autonomy collection is a member of only one index area. The VerityCollection database object also associates this collection with the IndexArea object and identifies the type of object being indexed (folder, annotation, and so on).
Index area
An index area is associated with a particular object store and is represented by an IndexArea object in the object store database. The IndexArea object contains the path name of a file system directory that is used to store the files for the collection. IBM® Legacy Content Search Engine queries and updates this directory as new indexable content is added to or removed from an object store.

Multiple index areas can be used to hold data for a single object store, but each index area can only hold data for one object store. Since an index area can have only one file system directory, multiple index areas are used for a single object store to spread the full-text indexing information across multiple file systems, either for geographic location or scalability purposes.

NOTE   If you have multiple index areas configured in an object store, Content Engine determines which index area to write the full-text indexing information to, based on the following order to optimize performance.

1. The index area on the same site as the virtual server.
2. The index area on the same site as the object store.
3. The index area on the same site as the storage area.

The status of an index area is either open (accepting new indexing information), or closed, on standby, or full (not accepting new indexing information). An index area can be created in standby mode; the index area is then automatically opened when a new index area is needed, and none are currently open. Autonomy-specific parameters that describe properties of the index area are also stored in the index area, such as Autonomy collections, Autonomy search servers, and the maximum number of collections.
Autonomy collection size
A single Autonomy collection can store up to 8 million entries. An entry is put into an Autonomy collection for each content element, plus an additional entry is put into the Autonomy collection for all the CBR-enabled string properties of an object (document, custom object, and so on). Each object store can have an unlimited number of Autonomy collections, associated with an unlimited number of index areas. The Autonomy collections are created automatically as needed when new data is full-text indexed.

You can affect the size and performance characteristics of an Autonomy collection by configuring features in the style set for the collection. For more information, see the P8CSE50 style set reference.