IBM Content Analyzer provides a function which integrates with relational databases and business intelligence tools.
IBM Content Analyzer generates an XML format file called Mining Markup Language (MIML) as the output of ICA Natural Language Processing (NLP). This MIML file contains the annotations that are created by the NLP process. This MIML file can be shred into the tables of a relational database. The tables are linked to each in the form of a start schema (OLAP schema) and hence can be used by business intelligence tools. This integration helps users to gain more insight into the unstructured text after it is analyzed by IBM Content Analyzer. The OLAP schema that is created by running the takmi_miml_database tool can be used by reporting tools such as IBM DB2 Alphablox or IBM Cognos.
IBM Content Analyzer provides a tool, takmi_miml_database, that helps you integrate IBM Content Analyzer with relational databases. The tool is provided as a stand-alone command line tool that operates on the Java Virtual Machine (JVM) for each supported operating system. Relational tables are created by default and form an OLAP schema (star schema). Each table is a dimension in the OLAP schema. The tool reads each text field of the MIML file and stores the annotations into these tables. You can add new dimension tables for any structured attributes. The default dimension tables that are created are DOCUMENT, DATE, UCAT. DOCUMENT_ANALYSIS is the fact table for the OLAP schema.
The relational databases that are supported by the tool are DB2 and Oracle. Integration with IBM DB2 Alphablox and IBM Cognos is also supported.
This figure illustrates a system overview of integration of IBM Content Analyzer with relational databases.
Prerequisite software
IBM Content Analyzer V8.4.2
IBM DB2 V9
IBM DB2 Alphablox V9.5
IBM Cognos 8 (BI Server and Framework Manager)
Operational Procedure
1. Modify the database_config.xml file.
The TAKMI_HOME/databases/DATABASE_NAME/conf/database_config.xml file must be modified as mentioned below.
<application_entry name="Dictionary"
- impl="com.ibm.takmi.impl.std.dic.edit.config.StandardDictionaryFactoryImpl"/>
<application_entry name="Indexer"
- impl="com.ibm.takmi.impl.rdb.idx.config.RdbIndexerFactoryImplDb2Oracle"/>
<application_entry name="AlertingSystem"
- impl="com.ibm.takmi.impl.std.alerting.config.StandardAlertingFactoryImpl"/>
<application_entry name="NLPResource"
- impl="com.ibm.takmi.impl.std.nlprsc.config.StandardNLPResourceFactoryImpl"/>
</application_entries>
| |
2. Define the user-created dimensions.
The tool that shreds annotations into tables creates a default star schema by using the following dimension and fact tables:
DATE (dimension table)
DOCUMENT (dimension table)
UCAT_BRIDGE (bridge table)
UCAT (dimension table), and
DOCUMENT_ANALYSIS (fact table)
In addition to the default dimension tables, you can create or add new dimension tables only for structured attributes that will be added to the default star schema. The features attribute of the TAKMI_HOME/databases/DATABASE_NAME/category/category_tree.xml file must be assigned the value dimension. For example, the following snippet of the category_tree.xml file creates two new structured dimension tables CAT_COUNTRY and CAT_ORGANIZATION. Each table uses the annotations of the Country and Organization categories.
<category_tree>
.
.
<node id="40" path="country" name="Country" features="dimension"/>
<node id="41" path="org" name="Organization" features="dimension>"/>
<node id="60" path="tkm_en_appl_sample_voc" name="Voice of Customers (sample)" features="">
- <node id="61" path="bad" name="Unfavorable" features="">
- <node id="62" path="word" name="word" features=""/>
- <node id="63" path="phrase" name="phrase" features=""/>
</node>
.
.
</category_tree>
|
The dimension table CAT_COUNTRY is based on the category country and CAT_ORGANIZATION is based on org. The following diagram shows the relational tables that are created by the tool that form the OLAP schema. In this diagram CAT_COUNTRY and CAT_ORGANIZATION are the user-created dimensions and all other tables are created by default.
For more information about how to create categories, see Section 2.3 "Designing a Category Tree of Operation Guide".
3. Run the tool
A wrapping shell, takmi_miml_database, is provided to invoke the Java program to shred the MIML file into the relational tables.
Open a command window (on Windows) or shell (on AIX®/Linux) to run the following commands:
Windows:
> takmi_miml_database.bat DATABASE_DIRECTORY MIML_FILE [OPTIONS]
AIX/Linux:
> takmi_miml_database.sh DATABASE_DIRECTORY MIML_FILE [OPTIONS]
|
|
where:
-
DATABASE_DIRECTORY: the path of the database directory
-
MIML_FILE: the MIML file from which annotations are to be stored into relational tables
You can also use the following options:
- -databaseName: the name of the database
- -driverType: the database driver type for either DB2 or Oracle
- -serverName: the server name of the database
- -portNumber: the port number of the database
- -user: the user name that has access to the database
- -password: the password for the user name
- -schema: the schema name for the specified database
- -vendorName: enter either DB2 or ORACLE, the default is DB2 if omitted
- -append: enter either YES or NO, the default is NO if omitted
Tip: The additional options can also be set in the TAKMI_HOME/databases/DATABASE_NAME/conf/database_config.xml file. Options specified in the command prompt override the options specified in the TAKMI_HOME/databases/DATABASE_NAME/conf/database_config.xml file.
Integration with business intelligence tools
The annotations that are stored in the relational tables of the OLAP schema can be analyzed by business intelligence reporting tools.
You can integrate IBM Content Analyzer with IBM DB2 Alphablox and IBM Cognos. The integration is provided for the default schema. However, if you add new dimensions, you must manually add to the default cube model that is generated after the integration.
To integrate with IBM DB2 Alphablox:
1. Create a new directory under the $ALPHABLOX_HOME/repository/cubes directory with the directory name being the lower-case name of the cube that you want to create.
2. Copy the file TAKMI_HOME/conf/abxcube.properties file into the directory that you created in step 1
3. Rename the abxcube.properties file to <lowercase_name_of_the_cube>.properties. - For example, if you want to create MYCUBE cube, create a directory mycube under $ALPHABLOX_HOME/repository/cubes and rename abxcube.properties file as mycube.properties.
4. Update the .properties file as follows
-
Replace SCHEMA_NAME value with the schema name of the database
-
Assign the relational data source name to the field ext.datasource
-
Assign the cube name to the field name
5. Open the IBM DB2 Alphablox administration page and click the Data Sources tab. Create a new relational source for the database where the tables are created.
6. Go to the Cubes tab from IBM DB2 Alphablox administration page, select the cube that you created and click Edit button.
7. Select the relational data source from the list under Relational Data Source. Click OK.
8. Start the cube.
9. Open IBM Alphablox Query Builder to do your analysis.
To integrate with IBM Cognos:
1. Replace the SCHEMA_NAME value in the TAKMI_HOME/conf/model.xml with the schema name of the tables
2. Replace the CUBE_NAME value with the cube name that you want to create
3. Replace the DATASOURCE_NAME value with the data source you have created on IBM Cognos. This data source must refer to the relational database where the annotations are stored.
4. Create a new project in IBM Cognos
5. In the Metadata Wizard - Select Metadata Source, select Cognos 8 Model. Click Next
6. In the Metadata Wizard - Select Model, select TAKMI_HOME/conf/cognosproject.cpf file. Click Next
7. In the Metadata Wizard - Select Objects, select the objects you want to import. Click Next
8. Click Finish.
9. Publish the package created.
11. Open IBM Cognos report studio to do your analysis.