Customized metadata mapping from OmniFind Enterprise Edition

Overview
OmniFind Enterprise Edition appends special metadata to crawled documents. When the documents come into IBM Content Analyzer PEAR module, the module can convert such metadata to the form which IBM Content Analyzer recognizes. You can view the the metadata in Text Miner along with text analytics data generated by IBM Content Analyzer.
The rule of metadata conversion of the PEAR module is customizable for individual users. The PEAR module includes a default set of conversions for convenience.
This document describes the structure of OmniFind Enterprise Edition metadata, the default mapping rule, and how to customize the mapping.
OmniFind Enterprise Edition metadata
OmniFInd Enterprise Edition appends two kinds of metadata: document metadata and fields.
Document metadata is a fixed set of metadata, primarily used by the system. Currently the following items are defined in the document metadata: Please note that, not all items are guaranteed to be filled, and the items definition can change in future releases.
Fields are more flexible metadata. Users can define their own fields as well as default fields which crawlers give to the crawled documents. The field value may or may not be in the document itself. For more detail, please refer OmniFind Enterprise Edition manuals.
Default mapping
A default metadata mapping comes with the IBM Content Analyzer PEAR module for convenience. By default, the PEAR module converts all metadata based on the following rule:
Source OmniFind Enterprise Edition metadataTarget IBM Content Analyzer metadata
Document metadataCategory under .ofee
Field, the value is outside the documentCategory under .offield
Field, the value is inside the documentText name
For the first two types of metadata, metadata name will be the category name and metadata value will be the category value. You should create these categories in your category tree to show these metadata in Text Miner. If a field value resides in the document content, the metadata is regarded as a label of the text segment. Therefore the field name will be converted into a text name as in the last row. Text Miner shows which segment of document content is labeled by the field.
Looking at the resulted MIML file, you can see what kind of metadata is attached by OmniFind Enterprise Edition. If you want to change name or category path of the metadata, or remove unnecessary metadata, please write a custom mapping rule as described in the following section.
Custom mapping
The configuration file for the metadata mapping is $PEAR_ROOT/database/conf/ofmapping.xml . $PEAR_ROOT is the root directory of the installed PEAR module in OmniFind Enterprise Edition, $ES_NODE_ROOT/data/pearsupport/PearIDN by default. The configuration file initially looks like:

<?xml version="1.0" encoding="UTF-8"?>
<metadataMapping>
    <useDefaultMapping />
</metadataMapping>
The useDefaultMapping element indicates the default mapping to be used.
You can change the configuration file to update the mapping rules. Here is a sample configuration file:

<?xml version="1.0" encoding="UTF-8"?>
<metadataMapping>
    <documentMetadata>
        <metadata>
            <name>url</name>
            <category>.url</category>
        </metadata>
    </documentMetadata>
    <fields>
        <field>
            <name>author</name>
            <textname>Authors</textname>
            <category>.doc.author</category>
        </field>
    </fields>
</metadataMapping>
There are two types of configurations, documentMetadata and fields. Both have zero or more settings for each metadata item. Each metadata item has the following three elements to specify its mapping.
ElementDescription
nameName of document metadata or field, as the source of the mapping.
textnameThe name of the text segment shown in Text Miner. If the mapping source is a field and the value is inside the document content, Text Miner labels the text segment with this parameter.
caetgoryCategory path, as the target of the mapping. If the mapping source is found, the value is mapped to this category. This parameter applies for all types of metadata.
Sample usage scenario
Let's take a look at a sample scenario using the metadata mapping. Here is a DB2 database table which contains structured data and unstructured (text) data:
NO (integer)NAME (varchar(50))VERSION (integer) VENDOR (varchar(30))COMMENT (varchar(100))
1DB29IBMHybrid data server for both XML and relational data.
2WebSphere Application Server6IBMIt delivers the secure, scalable, resilient application infrastructure.

We crawl the table with the DB2 crawler of the OmniFind Enterprise Edition, and then analyze the data with IBM Content Analyzer PEAR module using a custom metadata mapping.
The table looks like as follows, in the crawl space view of the DB2 crawler. By default, the column names are used as the field names.

And we use the following mapping configuration file. The configuration maps the document metadata url to the category .url. Also, it gives the text name Vendor to the field vendor, as well as mapping it to the category .company.

<?xml version="1.0" encoding="UTF-8"?>
<metadataMapping>
    <documentMetadata>
        <metadata>
            <name>url</name>
            <category>.url</category>
        </metadata>
    </documentMetadata>
    <fields>
        <field>
            <name>vendor</name>
            <textname>Vendor</textname>
            <category>.company</category>
        </field>
    </fields>
</metadataMapping>

With an appropriate category tree definition, the crawled data is displayed in Text Miner as follows:

In the Text Miner, you can see the categories 'Company' and 'OmniFind URL', representing the categories .company and .url defined in the mapping configuration file. And each document in the right side pain shows the named text 'Vendor' in addition to the whole content, as specified in textname element in the mapping configuration file.
Now that these metadata are mapped into categories, you can use them for further text analysis.