The amount of extracted text from a document is exceeding
a configuration setting, which prevents some of the text from being
indexed.
Symptoms
When a text document is indexed, the following error
appears in the
IBM® Content Search Services error
log:
Error details: IQQI0005E The document with ID <id> cannot be indexed.
Causes of the problem: IQQP0012W The document <id> exceeds the
limit of the size of the document in text format. The indexed document
will be truncated.
Causes
The number of characters in a document is greater than
the max.text.size parameter value.
Resolving the problem
If your queries are still working as expected, you might
consider the text that is not being indexed as unimportant for search
purposes. In this case, you can optionally ignore the error message.
Otherwise, if your queries are not returning the expected documents,
increase the maximum amount of text that can be indexed for a document.
To
increase the maximum amount of text that can be indexed for a document:
- Configuration tool parameters.
Use the configuration tool to increase the value of the maxHeapSize parameter.
This parameter determines the amount of memory that is allocated to
the Java™ Virtual Machine (JVM)
during server startup.
Tip: For more information about
the value to set, see "Heap memory consumption" in
Parameters that influence performance. As
indicated in that topic, the heap size determines the maximum value
possible for the
maxTextSize parameter that you
set in the next step.
- Configuration tool parameters.
For each collection, use the configuration tool to increase the value
of the maxTextSize parameter.
- Restart the IBM Content Search Services server.
- Test your changes in the user environment.
Important: If
you set too large a value for the maxTextSize parameter,
the server can run out of memory. What constitutes too large a value
depends on the operating system, the available memory, and the parameters
that you set in this procedure.