IBM FileNet P8, Version 5.2.1            

Indexing parameters

You can configure the following collection-specific parameters to improve performance: MaxMergeDocs, MaxMergeMB, MergeFactor, and BufferSize.

IBM Content Search Services indexing parameters are preconfigured for optimal performance. However, in certain situations you might want to adjust these parameters, for example, if you have very large indexes, very small files, or want to optimize indexing performance versus searching performance.

You can modify indexing parameters by using the configuration tool.

MaxMergeDocs

The MaxMergeDocs parameter defines the largest segment (measured by the number of documents) that can be merged with other segments in the index.

There is a trade-off between overall indexing throughput and segment merge time. If you specify a low value for the MaxMergeDocs parameter (for example, 100,000 documents), your segments will be limited in size. In this case, segment merges are quicker and indexing flows more smoothly. However, if your content is very large, there will be numerous segments and a degradation in indexing throughput over time.

If you specify a high value for the MaxMergeDocs parameter (for example, 100,000,000 or 500,000,000 documents), you get fewer segments (until the index becomes very large) and the overall indexing throughput is better. However, segment merges take more time and you might encounter timeouts during indexing.

Typically the value of MaxMergeDocs should be higher for collections of small documents and lower for collections of larger documents.

MaxMergeMB

The MaxMergeMB parameter defines the largest segment (measured by the physical size of the file) that can be merged with other segments in the index.

There is a trade-off between overall indexing throughput and segment merge time. If you specify a low value for the MaxMergeMB parameter (for example, 500 MB), your segments will be limited in size. In this case, segment merges are quicker and indexing flows more smoothly. However, if your content is very large, there will be numerous segments and a degradation in indexing throughput over time, as well as degradation in search performance.

If you specify a high value for the MaxMergeMB parameter (for example, 50,000 MB or 100,000 MB), you get fewer segments (until the index becomes very large) and the overall indexing throughput is better. However, segment merges take more time and you might encounter timeouts during indexing.

MergeFactor

The MergeFactor parameter defines the number of segments that are merged at a time and also controls the total number of segments that can accumulate in the index.

There is a trade-off between frequent, small merges (for example, two at a time) and less frequent, large merges (for example, 10 at a time). Modifying the merge factor does not typically impact performance.

BufferSize

The BufferSize parameter specifies the amount of RAM that can be used for buffering added documents before the documents are flushed as a new segment.

There is a trade-off between frequent, small flushes to disk and less frequent, large flushes to disk. In some cases you can improve performance by increasing the value of the BufferSize parameter. For example, when you index a single collection of small documents, increasing the buffer size will improve performance, especially for the first 100,000 documents in the index.



Last updated: October 2015
etsin015.htm

© Copyright IBM Corporation 2015.