You can improve indexing performance by monitoring system metrics and adjusting configuration settings.
Use the Content Platform Engine content-based retrieval counters to view the overall content-based retrieval process. The counters provide real-time details for a particular index job or index area; average metrics for batch sizes; processing time; documents created, updated, and deleted; content-based retrieval search metrics; and more. For more information, see Counter interpretation.
You can also gather information about the IBM Content Search Services server (such as memory usage and queue size) during indexing. This information is written to a CSV file in the YourCSSfolder\log directory. Information is also collected by IBM® System Dashboard for Enterprise Content Management, if available. Most of the information that is stored in the monitor.csv file can also be viewed as meter counters in the dashboard, in addition to other diagnostic information.
The following IBM Content Search Services server information is provided in both the monitor.csv output file and IBM System Dashboard for Enterprise Content Management, unless otherwise indicated.
Name | Description |
---|---|
Time | The current time (in seconds). This information is not displayed in IBM System Dashboard for Enterprise Content Management. |
Total number of processed documents | The total number of documents that were processed by IBM Content Search Services for all full-text indexes since server startup. Processing includes adds, updates, and deletes. Documents are counted regardless of whether the processing was successful or not. |
Total number of add requests that failed | The total number of failed add requests that were processed by IBM Content Search Services for all full-text indexes since the server started. This information is not displayed in the IBM System Dashboard for Enterprise Content Management. |
Total number of successful add requests | The total number of successful add requests that were processed by IBM Content Search Services for all full-text indexes since the server started. This information is not displayed in the IBM System Dashboard for Enterprise Content Management. |
Total size of successful add requests (KB) | The total memory size (in KB) of successful add requests that were processed by IBM Content Search Services for all full-text indexes since the server started. This information is not displayed in the IBM System Dashboard for Enterprise Content Management. |
Total number of delete requests that failed | The total number of failed delete requests that were process by IBM Content Search Services for all full-text indexes since the server started. This information is not displayed in the IBM System Dashboard for Enterprise Content Management. |
Total number of successful delete requests | The total number of successful add requests that were processed by IBM Content Search Services for all full-text indexes since the server started. This information is not displayed in the IBM System Dashboard for Enterprise Content Management. |
Total size of processed documents | The total memory size (in KB) of documents that were processed by IBM Content Search Services for all collections since server startup. |
Documents in input queue | The number of documents in the IBM Content Search Services input queue |
Input queue size (in bytes) | The memory size (in bytes) of documents in the IBM Content Search Services input queue |
Documents in the output queue | The number of documents in the IBM Content Search Services output queue |
Output queue size (in bytes) | The memory size (in bytes) of documents in the IBM Content Search Services output queue |
Documents waiting for preprocessing | The number of documents in the initial stage of the IBM Content Search Services indexing pipeline that are waiting for preprocessing |
Documents currently in preprocessing | The number of documents in the IBM Content Search Services indexing pipeline, in the second stage of preprocessing (text extraction, tokenization, and language analysis) |
Documents waiting for indexing | The number of documents in the third stage of the IBM Content Search Services indexing pipeline that are waiting to be indexed |
Documents currently being indexed | The number of documents in the final stage of the IBM Content Search Services indexing pipeline |
Number of concurrent queries | The number of ongoing queries that are currently running in the system. This number includes all searches that have started but not yet completed at the time of measurement. This information is not displayed in the IBM System Dashboard for Enterprise Content Management. |
Total number of queries | The total number of search requests that were processed by IBM Content Search Services since the server started. This information is not displayed in the IBM System Dashboard for Enterprise Content Management. |
Used heap memory (MB) | The amount of heap memory that is used by the JVM before Java™ memory garbage collection. This information is not displayed in IBM System Dashboard for Enterprise Content Management. |
Thread count | Number of threads that are used by the IBM Content Search Services server. This information is not displayed in IBM System Dashboard for Enterprise Content Management. |
System load | Provides an indication of the average system load for the previous minute, as provided by the JVM. This information might not be available on all platforms. This information is not displayed in IBM System Dashboard for Enterprise Content Management. |
Open file descriptors | The number of open operating system file descriptors. This information is available only for AIX, Linux, and Solaris systems on which the lsof utility is installed. |
Free physical memory | Provides an indication of the free physical memory on the computer, as provided by the JVM. This information might not be available on all platforms. This information is not displayed in IBM System Dashboard for Enterprise Content Management. |
Batches in progress |
|
Active merges | The number of index segment merges that are currently taking place |
Merge size (MB) | The total size (in megabytes) of index segment merges that are currently taking place |
You can troubleshoot IBM Content Search Services by monitoring the status of documents at each stage of the indexing pipeline.
Stage in document indexing pipeline | Column in the monitor.csv file |
---|---|
1. Input queue Contains documents that are waiting for preprocessing |
Number of documents that are waiting for preprocessing |
2. Document preprocessing Text extraction, tokenization, language analysis |
Number of documents that are currently being preprocessed |
3. Output queue Contains preprocessed documents that are waiting to be indexed |
Number of documents that are waiting to be indexed |
4. Index Contains indexed documents |
Number of documents that are currently being indexed |
The monitor.csv file is rotated like a log file (monitor0.csv, monitor1.csv, and so on). By default, queue status information is printed to the file every 10 seconds. To change the frequency, use the configuration tool to set a new value for the monitorQueuesFrequency parameter. You can disable queue monitoring by specifying a value of zero for the monitorQueuesFrequency parameter.