You can improve performance by adjusting configuration parameters that control the available heap memory size, the queue sizes, the number of indexing threads, and the number of preprocessing threads. Changing these parameters is especially effective when your collection contains large documents. Use the configuration tool to set these parameters.
During indexing and searching, IBM Content Search Services consumes heap memory for storing the indexed documents, preprocessing and indexing queues, and index memory structures. To optimize the performance of IBM Content Search Services, it is very important that you configure the maximal heap memory size in the JVM, and the queue size and file size limits accordingly. You can configure the maximum heap size during installation and upgrade, and by using the configuration tool.
The maxHeapSize parameter sets the maximum heap size for the IBM Content Search Services server. The default is 1.5 GB for 32-bit JVMs and 4 GB for 64-bit JVMs. This value must be a number between 1.5 GB and the maximum amount of memory that is allowed for a specific operating system (64-bit or 32-bit). For example, IBM Content Search Services on a 32-bit Windows system cannot be configured to consume more than 1.8 GB of heap memory. On a 64-bit system, there is no such limit. However, the amount of heap memory that is allocated should take into account the amount of available physical memory. Allocating heap memory beyond the capacity of the computer can have a negative impact on performance and can also result in out-of-memory errors.
You can set the maximum heap size when you install or upgrade IBM Content Search Services by specifying the IA_MAX_HEAP_SIZE parameter in the response file. When you set the maximum heap size to a value greater than 2 GB during the installation or upgrade of IBM Content Search Services on a 64-bit operating system, file size limits for text, XML, and binary documents are increased for new collections. For each 8.3 MB of heap memory over 2 GBs, the values of the file size limits are increased by 1 MB (starting from 60 MB up to 400 MB):
60 MB + (heap memory - 2 GB)/8.3
Maximum heap size | File size limits |
---|---|
2 GB | 60 MB |
3 GB | 180 MB |
4 GB | 300 MB |
You can configure the size of the input queue and output queue on the indexing server.
Consider the ratio between the memory size of the input and output queues and the heap memory. The required queue size is determined by the memory consumption of the documents in the queue. If you intend to process long documents (for example, 20 MBs each), consider increasing the input and output queue memory size and increasing the heap size. The input queue size, as well as the output queue size, should not be greater than 5% of the maximum heap size.
Increasing queue sizes without having enough memory available to the IBM Content Search Services server or enough physical memory on the IBM Content Search Services computer can have a negative impact on server performance. Also, increasing queue sizes to values greater than 150 MB can have a negative impact on performance.
By monitoring IBM Content Search Services memory consumption and overall system memory consumption, you can fine tune the queue sizes and the available heap memory. By monitoring the queues, you can ensure that the queues are large enough. If the queues are typically almost full during indexing, this indicates that there is no need to increase the queue sizes. For more information, see Monitoring system metrics to improve indexing performance.
Specifies the number of indexing threads that run on the server.
Multiple indexing threads can work in parallel to index documents, which usually reduces the elapsed time for text index updates. If multiple indexing threads work on the same collection, the effect is reduced by the coordination required to synchronize the processing among the threads. For example, four indexing threads working on four different text indexes will have a better total throughput than four indexing threads working on a single text index.
The number of indexing threads should be at least two and should not exceed the number of available processor cores. Ideally, the maximum number of parallel index updates should not exceed the number of indexing threads. The default is four indexing threads. With too many indexing threads or too many parallel index updates, the system performance decreases because of the overhead for process context switches.
Specifies the number of preprocessing threads that run on IBM Content Search Services servers. The number of preprocessing threads should be at least four, and should be equal to or greater than the number of available processors.
Specifies the full path to the location of temporary files.
During preprocessing and indexing, IBM Content Search Services stores intermediate files in a temporary folder. You can improve performance by specifying a RAM disk or other fast storage (such as SSD or striped disk storage) for the temporary directory.