The constructors.xml
file contains the XML filter constructors that associate your surplus XML elements with Content Engine classes. Each IBM® Content Search Services server has an associated constructors.xml file in the <server-home>/config
directory.
You can configure multiple XML filter constructors. Each XML filter constructor associates a set of Content Engine classes with a set of surplus XML elements. For optimal indexing performance, only associate a Content Engine class with the surplus XML elements that apply to the XML content of that class. For example, suppose that you have one XML filter constructor that associates five Content Engine classes with five surplus XML elements. If each surplus XML element only applies to one of the Content Engine classes, the filtering configuration is roughly 20% efficient: For each XML document, only one surplus element potentially applies but the XML filtering utility must check for the presence of all five elements. In this example, the optimal configuration would be to associate each Content Engine class with one surplus XML in five separate XML filter constructors.
Add one <constructor>
element to the constructors.xml file for each XML filter constructor that you want to configure. The following example shows the <constructor>
element format:
<?xml version="1.0" encoding="UTF-8"?>
<constructors>
<constructor>
<name>XMLFilter1</name>
<class>com.ibm.filenet.cse.cascade.xmlfilter.XMLFilterConstructor</class>
<batchSize>15</batchSize>
<customConfig><![CDATA[
Type=XMLFilter
SymbolicClassName=LaurelTree,OliveTree,OrangeTree
ElementsToRemove=/document/item/rawitemdata,//font,/Configuration
CleanupOldDirTime=1
SleepIntervalForCleanup=2000 ]]>
</customConfig>
</constructor>
</constructors>
To configure multiple XML filter constructors, add multiple <constructor>
elements as shown in the following example:
<?xml version="1.0" encoding="UTF-8"?>
<constructors>
<constructor>
…
</constructor>
<constructor>
…
</constructor>
<constructor>
…
</constructor>
</constructors>
The following XML shows the <constructor>
element format. The required items of information are shown italicized. The details of each item are described in the table that follows the XML.
<?xml version="1.0" encoding="UTF-8"?>
<constructors>
<constructor>
<name>filter-name</name>
<class>filter-class</class>
<batchSize>filter-batchsize</batchSize>
<customConfig><![CDATA[
Type=config-type
SymbolicClassName=config-classlist
ElementsToRemove=config-elementlist
CleanupOldDirTime=config-cleanupage
SleepIntervalForCleanup=config-sleepinterval
]]>
</customConfig>
</constructor>
</constructors>
Item | Description |
---|---|
filter-name | The name that you assign for the XML filter. The name must be unique within constructors.xml . |
filter-class | The name of the filter class. This name must be com.ibm.filenet.cse.cascade.xmlfilter.XMLFilterConstructor . |
filter-batchsize | The maximum number of XML documents that can be sent to the XML filter in one request. |
config-type | The name of the configuration type. The name must be XMLFilter . |
config-classlist | The Content Engine class names of the objects whose XML document content is filtered. Specify the names in a comma delimited list.
A class name is not specific to an object store. For example, suppose that you specify Laureltree as one of the class names. If the Laureltree class exists in two object stores, the XML content of the Laureltree objects in both object stores is filtered. When an XML document is filtered, the document is examined for the presence of surplus XML elements. An XML document is always filtered in this sense. Any detected surplus XML elements are removed from the XML document before the remainder of the document is indexed. No XML elements are removed from an XML document, however, if no surplus XML elements exist within the document. For an XML document to be filtered, the document must be the sole content element for a Content Engine object. The content of objects with multiple content elements is treated as text (as opposed to XML) even though some or all of the content element files might be XML documents. The encoding for an XML document is assumed to be UTF-8 unless otherwise specified in the document. |
config-elementlist | The surplus XML elements that are removed from the XML content. Specify the names in a comma delimited list. Specify each name in one of the following ways:
|
config-cleanupage | The minimum age in hours of the filtering work directories that are deleted during cleanup runs. For example, if the value of this item is 1, the XML filtering utility deletes any work directory that is one hour or older.
Work directories are created as part of the XML filtering process in the following location:
The One work directory is created for each index batch. The name of a work directory is |
config-sleepinterval | The number of milliseconds between work directory cleanup runs. The XML filtering utility deletes work directories during the cleanup runs. |
Related tasks
For information about setting surplus XML elements for a server, see Setting XML elements as non-searchable.