Integration with FileNet P8

IBM Content Analyzer can retrieve documents that are stored in FileNet P8 server to create ATML files for text analysis. IBM Content Analyzer can also write back category information as the result of text analysis to the FileNet P8 server. This manual illustrates how to use this function.
Capability
With IBM Content Analyzer, you can:
Sample configuration file
The following sample configuration file is used to communicate with FileNet P8 servers:
<?xml version="1.0" encoding="UTF-8"?>
<filenetBridgeConfiguration version="1">
    <server>
        <url>http://localhost:9080/wsi/FNCEWS40DIME</url>
        <username>user</username>
        <password>password</password>
    </server>
    <domain></domain>
    <objectStore>MyObjectStore</objectStore>
    <documentSelection>
        <folders>
            <folder>/docs/analyze</folder>
        </folders>
    </documentSelection>
    <contentMapping name="DocumentContent">
        <textContentPattern encoding="MS932">text/plain</textContentPattern>
        <textContentPattern encoding="UTF-8">text/html</textContentPattern>
        <binaryContentPattern>^application/pdf$</binaryContentPattern>
    </contentMapping>
    <propertyMappings>
        <propertyMapping>
            <symbolicName>DocumentTitle</symbolicName>
            <mappingTarget>
                <title />
            </mappingTarget>
        </propertyMapping>
        <propertyMapping>
            <symbolicName>DateCreated</symbolicName>
            <mappingTarget>
                <date />
            </mappingTarget>
        </propertyMapping>
        <propertyMapping>
            <symbolicName>Comment</symbolicName>
            <mappingTarget>
                <text name="Comment" />
            </mappingTarget>
        </propertyMapping>
    </propertyMappings>
    <outputATML>
        <basename>filenet_data</basename>
        <maxDocuments>2000</maxDocuments>
    </outputATML>
    <categoryRecord property="Category">
        <serialOperation ignoreError="false" />
    </categoryRecord>
</filenetBridgeConfiguration>
This configuration file specifies the following information: Assuming the sample file is saved as "config.xml" you can retrieve FileNet documents with the following command:
takmi_filenet2atml config.xml
After the text analysis of the resulting ATML files, you can write back the category information from the generated MIML as:
takmi_miml2filenet config.xml filenet_data_XXXX.miml
On Linux and AIX, add .sh to the command names.
Configuration file format
Under the first filenetBridgeConfiguration element, the following elements are defined:
  1. server
  2. domain
  3. objectStore
  4. documentSelection
  5. contentMapping
  6. propertyMappings
  7. outputATML
  8. categoryRecord
These elements are described below.
server element
The server element specifies the FileNet P8 server information:
Server specification
Element Description
url URL of the FileNet P8 server web services interface.
username User name for the server.
password Password for the server.
domain and objectStore element
The domain and objectStore elements specify the repository location. If the domain element is empty, the default domain will be used.
documentSelection element
The documentSelection element specifies the set of target documents. You can select the documents from a folder, or you can use an SQL query to find the documents.
Target document specification
Element Attribute Description Cardinality or type
folders Contains one or more folder elements. 0 — 1
folder Path of the target FileNet folder. 0 — n
recursive Whether or not to select documents recursively in subfolders. Boolean, defaults to true
querySQL Custom SQL query to select FileNet documents. It should retrieve Id column in the SELECT list to generate document IDs. Please see FileNet manuals for the detail. 0 — 1
contentMapping element
The following elements and attributes for the contentMapping element specify how the tool extracts text segments from files that are uploaded as FileNet documents.
Content mapping specification
Element Attribute Description Cardinality or type
contentMapping Top-level element of the content mapping configuration. This element contains all the elements and attributes that are described in this table. 0 — 1
name Name of the text that is created during content extraction. This name will be shown in the MINER application. string
maxLength Maximum length of the extracted text. The text cannot be longer than this value. integer, defaults to 65535
contentReplacement Replaces specified characters in the retrieved contents by using Java™ regular expression syntax. 0 — 1
pattern Regular expression pattern to be replaced. string
replacement Replacement characters. This attribute can contain references to capturing groups in the pattern attribute. string
textContentPattern Regular expression pattern of the MIME type of the files to be considered as text files. 0 — n
encoding The encoding of the text file. string, defaults to "UTF-8"
binaryContentPattern Regular expression pattern of the MIME type of the files to be considered as binary files. 0 — n
propertyMapping element
The parameters of the propertyMapping element specify how the tool maps FileNet document properties to the resulting ATML documents.
Property mapping specification
Element Attribute Description Cardinality or type
propertyMappings Top level element of the property mapping configuration. This element contains zero or more propertyMapping elements. 0 — 1
propertyMapping Represents one property mapping. It specifies a FileNet document property as the mapping source and one or more mapping targets as child elements. The child elements are described in the following sections of this table. 0 — n
symbolicName FileNet property name as the source of the mapping. The normalized symbolic name is used rather than the display name. 1
mappingTarget The target of the mapping. Because one mapping can have multiple targets, this element can have multiple elements below as its child elements. 1
standardFeature Standard feature of IBM Content Analyzer as the mapping target. 0 — 1
category Category path of the standard feature. string
dynamicPath Allows flexible mapping to the category path based on the property value. 0 — n
value FileNet property value to be matched. If a property value matches this configuration, a standard feature will be generated with the specified category path. This value will be the value of the generated standard feature. string
category Category path of the standard feature. string
text This element maps a FileNet property value to a text in ATML documents. The text is the subject of the text analysis unlike other mapping targets. Because an ATML document can have multiple texts, one or more FileNet properties can be mapped to ATML texts. 0 — 1
name Name of the text shown in the MINER application. string
date Maps a FileNet property value to the special date property in ATML documents. The property type must be DateTime or String (IBM Content Analyzer string format of date). 0 — 1
title Maps a FileNet property value to the title in ATML documents. 0 — 1