Edition Notice
First Edition (February 2007)
This edition applies to version 8, release 4 of IBM® OmniFind™ Analytics Edition and to all subsequent releases and modifications until otherwise indicated in new editions.
This document contains proprietary information of IBM. This proprietary information is provided in accordance with the license conditions and is protected by copyright. Information contained in this document provides no warranties whatsoever for any products. Also, no descriptions provided in this document should be interpreted as product warranties. Depending on the system environment, the yen symbol may be displayed as the backslash symbol, or the backslash symbol may be displayed as the yen symbol.
© Copyright International Business Machines Corporation 2007. All rights reserved.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
1 Introduction
This document describes how to use the IBM OmniFind Analytics Edition Alerting System application.
1.1 Pop-up Help
If you click the  button next to an item on the Alerting System screen, a description of the selected item is displayed. You can use this pop-up help to learn definitions of terms without opening the online help.
1.2 Functional Overview
The Alerting System, which focuses on finding problems, supports heavy-load batch analysis that cannot be performed by the interactive operations of Text Miner. You use the Alerting System user interface to specify the categories and parameters to be used in analyses.
-
Increase detection function:
Among a large number of keywords (including phrases), this function detects keywords which recently have been increasingly used. For example, among the "Noun" category keywords contained in the call center data, this function detects the keywords that are increasingly used through time for each week. The results of this analysis is useful for identifying future problems. Although the increase analysis function of Text Miner can analyze up to 50 keywords in the form of a graph that shows increases and decreases, the Alerting System can analyze up to 20,000 keywords.
-
Correlation detection function:
This function extracts combinations of keywords and subcategories that are highly correlated to each other. For example, by extracting the correlation between "product" category keywords and "part/phrase" category keywords from a manufacturer's call center data, expressions to describe product defects that are frequently used in association with a particular product can be extracted. Although the Text Miner correlation analysis function can analyze up to 100 keywords along the vertical axis x 100 keywords along the horizontal axis in a two-dimensional map, the Alerting System can analyze up to 1,000 keywords x 12,000 keywords.
1.3 Page Transition
The Alerting System user interface provides the following screen for specifying alert extraction options.
-
In the Select Database screen, select the databases to be used for alert extraction. A link must be made to this screen even if there is only one database.
-
In the Analysis Type Selection screen, select either Increase detection or Correlation detection as a type of alert extraction. Because only one person at a time can edit the settings for a particular extraction type for a particular database, this screen is shown for each extraction type if anyone is currently editing the settings.
-
Edit the categories and parameters to be subject to increase detection in the Increase Detection Entry List screen, Increase Detection Edit Category screen, and the Increase Detection Edit Parameters screen. For each database, only one user can use these screens to specify settings, and this user will lock the settings.
-
The Increase Detection Alert Results screen shows a list of increased keywords that were extracted in accordance with the entry settings. From each of these keywords, it is possible to connect to the Time Series view in Text Miner.
-
Edit the categories and parameters to be subject to correlation detection in the Correlation Detection Entry List screen, Correlation Detection Edit Category screen, and the Correlation Detection Edit Parameters screen. For each database, only one user can use these screens to specify settings, and this user will lock the settings.
-
The Correlation Detection Alert Results screen shows a list of combinations of highly-correlated keywords and categories that were extracted in accordance with the entry settings. From each of these items, it is possible to connect to the Docs view of relevant documents in Text Miner.
1.4 Database Selection
In the Select Database screen, specify a database to configure alert settings for. For online help, click the  button located to the right of the screen title. Select Database screen:
1.5 Analysis Type Selection
In the Analysis Type Selection screen, select an extraction type to be used in the specified database, either increase detection or correlation detection. The following figure shows the Analysis Type Selection screen in the CALL_CENTER database. Analysis Type Selection screen:
-
Lock:
Click either Increase detection or Correlation detection to lock the settings (mark the settings so that no other users can edit the settings) and start editing the settings. When the settings are locked by another user, the message "Locked (edited) by user_name" is displayed in the status field. When an unregistered user is editing, the message "Locked (edited) by anonymous_user" is displayed.
-
When the previous user exits edit mode while the settings are still locked:
If a user closes and exits the browser from the screen marked with a dotted line in after specifying detection settings, the settings will stay locked by that user. When this occurs, edit the settings by using the interrupt edit function.
-
Interrupt edit:
Even if the status field does not show that the settings are "Editable," clicking the link for the extraction type will present a dialog that asks whether you want to interrupt the edit. Click OK to cancel the settings currently being edited by another user and start editing. Interrupt edit confirmation dialog:
2 Increase Detection
2.1 Increase Detection
In increase detection, keywords are detected from the maximum of 20,000 keywords and subcategories in accordance with the ranking of "increase indicators within the latest time period" (ranking of keywords with the increased frequency of appearance on the latest date) based on the Delta view of Text Miner. Because the keywords and subcategories detected in increase detection are likely to increase the frequency of appearance in the future, examination of these keywords and subcategories can lead to early detection of problems. In the increase detection settings, specify a "vertical category" for the Delta view, and analyze the frequency of appearance of the keywords belonging to that category and its subcategories (including the ones that are more than two levels below) in Time Series. Text Miner Delta view:
2.2 Increase Detection Entry List
-
Increase detection entry:
In increase detection, a keyword group and subcategory group are specified from a particular category as the target of detection, and this category and a set of parameter settings associated with detection are collectively managed as an entry. Increase detection entry = category + parameter group For example, when "Noun" is specified as a category and the number of keywords to be analyzed is set to 10,000 as a parameter, keywords that are increasingly used recently can be detected from 10,000 keywords that are nouns. The "Noun" category, with parameters such as the number of keywords to be analyzed being set to 10,000 keywords, is the "Word (weekly)" entry.
-
Entry operation:
In the Increase Detection Entry List screen, you can use the New Entry button and Delete link to add and delete entries, respectively. Although a new entry to be added to the list needs an entry name to be displayed in the list, you can change the name when you specify parameter settings later on.
-
Edit:
You can edit categories by clicking the Category link, and you can edit parameters by clicking the Parameters link.
2.3 Increase Detection Edit Category
2.4 Increase Detection Edit Parameters
Set the analysis parameters in the Increase Detection Edit Parameters screen. The definition of each parameter can be checked by clicking  .
-
Saving the parameter settings:
After editing the parameters by using the text field or radio buttons, click the Save button to save the changes and return to the Increase Detection Entry List screen.
-
Canceling the parameter changes:
After editing the parameters by using the text field or radio buttons, click the Cancel button to cancel the parameter changes and return to the Increase Detection Entry List screen.
-
Entry name:
Specify the names of the entries to be listed in the Increase Detection Entry List screen.
-
Category:
Categories set in category edit mode are displayed. Categories cannot be edited in this screen. In increase detection, over-time changes in keywords and subcategories of the specified category are analyzed to detect the ones in which the rate of increase in frequency of use has been notably rising.
-
Maximum number of alerts:
Specify the maximum number of alerts to be detected. Keywords or subcategories whose frequency of use is notably increasing are detected as alerts. Detection is not based on a threshold, but instead, the keywords or subcategories whose increase indicator is ranked within the top N are detected. For example, if you set the maximum number of alerts to 20, the top 20 keywords or subcategories in which the rate of increase in frequency of use has been notably rising will be returned.
-
Maximum number of keywords:
Specify the maximum number of keywords or subcategories to be analyzed. For example, when the number of keywords to be analyzed is set to 5,000, the keywords and subcategories with a high rate of increase in frequency of use are detected among 5,000 keywords and subcategories.
-
Minimum frequency:
Although a collection of documents that is retrieved under the conditions described above is attached to each alert, alerts with only a few documents might be meaningless; therefore, specify the minimum number of documents required to narrow down the alerts to be detected. Alerts will not be returned if the number of applicable documents is below the set value.
-
Decaying factor:
In increase detection, keywords and subcategories that "are relatively more frequently used recently than the past average frequency" are detected, and this parameter specifies how old the data should be when obtaining the past average frequency. The past value weighs more as the value becomes larger. For example, when the decaying factor is 0.8 and the time scale is set to "month," the frequency obtained two months ago weighs 0.8 times more than the frequency obtained in the previous month when calculating the average.
-
Time scale:
From monthly, weekly, and daily, select one for acquisition of frequency data when analyzing the time-series frequency of use of keywords and subcategories. Select a long time scale if you wish to analyze a slow increase in frequency or to analyze keywords that are not used frequently. Select a short time scale, however, if you wish to analyze a short-term increase in frequency or to analyze keywords that are used moderately or quite frequently.
2.5 Increase Detection Alert Results
Alert result display method: After entries for increase detection are set, a View link is generated inside the Result field in the Increase Detection Entry List screen for the entries for which alerts are extracted as a result of the batch processing (the timing of the processing depends on the operation). Click the View link to see the list of alert results. Link from a batch-processed entry: Alert result:
-
Settings that are effective when the batch processing is run are shown in the top-left table.
-
The processing status is shown in the top-right table.
-
Keywords and subcategories for which an increase in frequency of use was detected are shown in each line of the bottom table, and these keywords and subcategories are arranged in descending order of the increase indicator value.
-
Categories selected for a particular entry are shown in the Noun field in the previous figure. Extracted keywords and subcategories are shown in each line, and an identifier such as "XXX (keyword)" and "XXX (category)" is shown for each of them.
-
The Frequency field shows the frequency of keyword use on the latest date of time-series analysis. For example, when the time scale is set for weekly analysis, this field shows the frequency of use of the keyword in the corresponding line for the latest week.
-
The Increase indicator field shows the index to describe how much the frequency of use of that keyword has increased.
-
The Jump field is linked to Text Miner, and clicking it shows the Time Series view, which is generated based on the keyword in that line.
3 Correlation Detection
3.1 Correlation Detection
In correlation detection, combinations of keywords and subcategories that are highly correlated to each other are extracted. Analysis by the correlation detection function is almost the same level as the analysis carried out in the Text Miner two-dimensional (2D) map with the largest "Max lines to display" for both the vertical and horizontal axes, and it examines correlations for the number of items that are equivalent to 1,000 x 12,000 cells at maximum. Text Miner 2D map view: In addition to the scale, differences between correlation detection and the 2D map are as follows:
-
Only the keywords and subcategories immediately under the categories arranged along the vertical and horizontal axes are analyzed in the 2D map, but in correlation detection, subcategories that are more than two levels down in the category hierarchy can be analyzed. For this reason, correlations in intermediate levels in the hierarchy can be analyzed by defining the product classification levels in terms of categories. For example, assume that there are product classifications as shown below:
Category tree
|
Product
|
T-shirt
|
ABC-T-shirt
|
XYZ-T-shirt
|
XYZ-T-shirt L
|
XYZ-T-shirt M
|
XYZ-T-shirt S
|
Jacket
|
Shoes
|
|
If there is a strong correlation between "XYZ-T-shirt L" and the phrase "size … large," the size indication for only the large size products has a problem. If, however, there is a strong correlation between "XYZ-T-shirt" and "size … large," the overall size indication system for the XYZ brand might have a problem.
-
In the 2D map, the vertical axis and the horizontal axis are equal to each other, and the keywords and subcategories of the category specified as the horizontal category are arranged along the horizontal axis. In correlation detection, however, the analysis axes are divided into the compared category and the content category. Keywords and subcategories in the compared category will be compared from the perspective of keywords and subcategories in the content category. Also, up to two categories can be specified as the content categories, and on the 2D map horizontal axis, not only independent keywords and subcategories but also pairs of these elements can be analyzed. See the following figure for the detailed image.
When a correlation analysis is visualized using the 2D map: Compared category: product First content category: noun phrase Second content category: purchase history
3.2 Correlation Detection Entry List
-
Correlation detection entry:
In correlation detection, a set of compared category, content category, and parameters associated with detection is managed as an entry. Refer to for a description of compared categories and content categories. Correlation detection entry = compared category + content category + parameter group
-
Entry operation:
In the Correlation Detection Entry List screen, you can use the New Entry button and Delete link to add and delete entries, respectively. Although a new entry to be added to the list needs an entry name to be displayed in the list, you can change the name when you specify parameter settings later on.
-
Edit:
You can edit categories (compared categories and content categories) by clicking the Category link, and you can edit parameters by clicking the Parameters link.
3.3 Correlation Detection Edit Category
3.4 Correlation Detection Edit Parameters
Set the analysis parameters in the correlation detection edit parameters screen. The definition of each parameter can be checked by clicking  .
-
Saving the parameter settings:
After editing the parameters by using the text field or radio buttons, click the Save button to save the changes and return to the Correlation Detection Entry List screen.
-
Canceling the parameter changes:
After editing the parameters by using the text field or radio buttons, click the Cancel button to cancel the parameter changes and return to the Correlation Detection Entry List screen.
-
Entry name:
Specify the names of the entries to be listed in the Correlation Detection Entry List screen.
-
Compared category:
Compared categories set in category edit mode are displayed. Compared categories cannot be edited in this screen. Refer to for a description of compared categories.
-
Content category:
Content categories set in category edit mode are displayed. Content categories cannot be edited in this screen. Refer to for a description of content categories.
-
Maximum number of alerts:
Specify the maximum number of alerts to be detected. Alerts are detected as combinations of [keyword or subcategory of the compared category, keyword or subcategory (or two subcategories) of the content category], and returned in the order of strength of correlation between the compared category and the content category up to the number specified in this setting. For example, if you set the maximum number of alerts to 20, the top 20 combinations of highly-correlated keywords and subcategories will be returned.
Meaning of the parameters when a correlation analysis is visualized using the 2D map: (1): number of compared category keywords (2): number of content category keywords (3): number of pairs of content category keywords
-
Maximum number of keywords in compared category:
Set the maximum number of compared category keywords or subcategories to be used in the analysis. For example, when the Model category is set as the compared category and at the same time the number of compared category keywords is set to 200, the 200 most frequently mentioned models will be the subject of the analysis.
-
Maximum number of keywords in content category:
Set the maximum number of content category keywords or subcategories to be used in the analysis (up to two content categories can be specified). For example, when the Bad Reputation and Part Name categories are set as the content categories and at the same time the number of content category keywords is set to 200, up to 200 types of bad reputations and up to 200 part names will be the subject of the analysis.
-
Maximum number of keyword pairs in content category:
When two content categories are specified, set the maximum number of pairs consisting of the keywords and subcategories of the first content category and the keywords and subcategories of the second content category. For example, when the number of pairs of content category keywords is set to 10,000 while Bad Reputation and Part Name are specified as the content categories, up to 10,000 pairs consisting of [expression of bad reputation, part name] will be the subject of the analysis, and an alert will be returned if a strong correlation is detected either in the compared category keywords and subcategories or in the 10,000 pairs.
-
Minimum frequency:
Although a collection of documents that is retrieved under the conditions described above is attached to each alert, alerts with only a few documents might be meaningless; therefore, specify the minimum number of documents required to narrow down the alerts to be detected. Alerts will not be returned if the number of applicable documents is below the set value.
-
Confidence coefficient:
This is a parameter used when statistically calculating a correlation value (correlation strength). If the coefficient is set high, a relatively large number of alerts having the sufficient number of applicable documents for calculating correlation strength will be returned, and if it is set low, a large number of alerts with the slightest possibilities will be returned even if there is not a sufficient number of applicable documents for calculating correlation strength.
3.5 Correlation Detection Alert Results
Alert result display method: After entries for correlation detection are set, a View link is generated inside the Result field in the entry list screen for the entries for which alerts are extracted as a result of the batch processing (the timing of the processing depends on the operation). Click the View link to see the list of alert results. Link from a batch-processed entry: Alert result:
-
Settings that are effective when the batch processing is run are shown in the top-left table.
-
The processing status is shown in the top-right table.
-
Pairs or triads of keywords and subcategories for which a correlation was detected are shown in each line of the bottom table, and these keywords and subcategories are arranged in descending order of the correlation indicator value.
-
Categories selected for a particular entry are shown in the "Software," "Technical term," and "Command" fields in the previous figure. If only one content category is specified, a compared category and a content category are shown in this field. Extracted keywords and subcategories are shown in each line, and an identifier such as "XXX (keyword)" and "XXX (category)" is shown for each of them. There are three display fields (two content categories are specified), and if an extracted item is a pair, then the remaining field will be left blank.
-
The Frequency field shows the number of documents containing all the extracted keywords and subcategories.
-
The Correlation indicator field shows the index to describe the level of correlation between the content category items and compared category item (keyword or subcategory) in that line.
-
The Jump field is linked to Text Miner, and clicking it shows the Docs view, narrowed down by all the keywords or subcategories in that line.
4. Batch Processing
4.1 Increase Detection Batch Processing
This section describes the increase detection batch processing required for the operation. Skip this section if the alert settings are made via the user interface. When the increase detection command is run on the server machine, a report file is generated as the analysis result within the directory for the specified database: Database directory\alerting\batch\increase_detection_report.xml Determine how to use the report file in accordance with the operation. In the increase detection batch processing, the %TAKMI_HOME%\bin\takmi_alert_increase.bat command is used. Prior conditions: The language processing and index creation needs to be completed within the database where increase detection is to be run. In addition, the Alerting System Web application must not be in operation. Command syntax: takmi_alert_increase.bat DATABASE_NAME MAXIMUM_ANALYSIS_TIME_BY_MINUTE JAVA_HEAP_SIZE_BY_MEGA_BYTES
-
DATABASE_NAME:
The database name defined in global_config/database_entries/database_entry@name of global_config.xml.
-
MAXIMUM_ANALYSIS_TIME_BY_MINUTE:
Specify the maximum analysis time in units of "minutes."
-
JAVA_HEAP_SIZE_BY_MEGA_BYTES:
Specify the Java™ heap size for analysis in units of "megabytes."
4.2 Correlation Detection Batch Processing
This section describes the correlation detection batch processing required for the operation. Skip this section if the alert settings are made via the user interface. When the correlation detection command is run on the server machine, a report file is generated as the analysis result within the directory for the specified database: Database directory\alerting\batch\correlation_detection_report.xml Determine how to use the report file in accordance with the operation. In the correlation detection batch processing, the %TAKMI_HOME%\bin\takmi_alert_correlation.bat command is used. Prior conditions: The language processing and index creation needs to be completed within the database where correlation detection is to be run. In addition, the Alerting System Web application must not be in operation. Command syntax: takmi_alert_correlation.bat DATABASE_NAME MAXIMUM_ANALYSIS_TIME_BY_MINUTE JAVA_HEAP_SIZE_BY_MEGA_BYTES
-
DATABASE_NAME:
The database name defined in global_config/database_entries/database_entry@name of global_config.xml.
-
MAXIMUM_ANALYSIS_TIME_BY_MINUTE:
Specify the maximum analysis time in units of "minutes."
-
JAVA_HEAP_SIZE_BY_MEGA_BYTES:
Specify the Java heap size for analysis in units of "megabytes."
Terms of Use
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not grant you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to:
IBM World Trade Asia Corporation
Licensing
2-31 Roppongi 3-chome, Minato-ku
Tokyo 106-0032, Japan
The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact:
IBM Corporation
Silicon Valley Lab
Building 090/H-410
555 Bailey Avenue
San Jose, CA 95141-1003
U.S.A.
Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee.
The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us.
Information concerning non-IBM products was obtained from the suppliers of those products, their
published announcements or other publicly available sources. IBM has not tested those products
and cannot confirm the accuracy of performance, compatibility or any other claims related to
non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the
suppliers of those products.
All statements regarding IBM's future direction or intent are subject to change or withdrawal
without notice, and represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.
Copyright License
This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
Trademarks
This topic lists IBM trademarks and certain non-IBM trademarks.
See for information about IBM trademarks.
The following terms are trademarks or registered trademarks of other companies:
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc.
in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Intel, Intel Inside (logos), MMX and Pentium are trademarks of Intel Corporation in the United States, other
countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product or service names might be trademarks or service marks of others.
|