Edition Notice
This edition applies to version 8, release 4 of IBM Content Analyzer and
to all subsequent releases and modifications until otherwise indicated
in new editions.
This document contains proprietary information of IBM. This proprietary
information is provided in accordance with the license conditions and is
protected by copyright. Information contained in this document provides
no warranties whatsoever for any products. Also, no descriptions provided
in this document should be interpreted as product warranties. Depending
on the system environment, the yen symbol may be displayed as the backslash
symbol, or the backslash symbol may be displayed as the yen symbol.
© Copyright International Business Machines Corporation 2007, 2008. All
rights reserved.
US Government Users Restricted Rights - Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corp.
1 Introduction
This document describes how to use the IBM Content Analyzer Alerting System
application.
1.1 Pop-up Help
If you click the  button next to an item on the Alerting System screen, a description of
the selected item is displayed. You can use this pop-up help to learn definitions
of terms without opening the online help.
1.2 Functional Overview
The Alerting System, which focuses on finding problems, supports heavy-load
batch analysis that cannot be performed by the interactive operations of
Text Miner. You use the Alerting System user interface to specify the categories
and parameters to be used in analyses.
- Increase detection function:
Among a large number of keywords (including phrases), this function detects
keywords which recently have been increasingly used. For example, among
the "Noun" category keywords contained in the call center data,
this function detects the keywords that are increasingly used through time
for each week. The results of this analysis is useful for identifying future
problems. Although the increase analysis function of Text Miner can analyze
up to 50 keywords in the form of a graph that shows increases and decreases,
the Alerting System can analyze up to 20,000 keywords.
- Correlation detection function:
This function extracts combinations of keywords and subcategories that
are highly correlated to each other. For example, by extracting the correlation
between "product" category keywords and "part/phrase"
category keywords from a manufacturer's call center data, expressions to
describe product defects that are frequently used in association with a
particular product can be extracted. Although the Text Miner correlation
analysis function can analyze up to 100 keywords along the vertical axis
x 100 keywords along the horizontal axis in a two-dimensional map, the
Alerting System can analyze up to 1,000 keywords x 12,000 keywords.
1.3 Page Transition
The Alerting System user interface provides the following screen for specifying
alert extraction options.
- In the Select Database screen, select the databases to be used for alert
extraction. A link must be made to this screen even if there is only one
database.
- In the Analysis Type Selection screen, select either Increase detection
or Correlation detection as a type of alert extraction. Because only one
person at a time can edit the settings for a particular extraction type
for a particular database, this screen is shown for each extraction type
if anyone is currently editing the settings.
- Edit the categories and parameters to be subject to increase detection
in the Increase Detection Entry List screen, Increase Detection Edit Category
screen, and the Increase Detection Edit Parameters screen. For each database,
only one user can use these screens to specify settings, and this user
will lock the settings.
- The Increase Detection Alert Results screen shows a list of increased keywords
that were extracted in accordance with the entry settings. From each of
these keywords, it is possible to connect to the Time Series view in Text
Miner.
- Edit the categories and parameters to be subject to correlation detection
in the Correlation Detection Entry List screen, Correlation Detection Edit
Category screen, and the Correlation Detection Edit Parameters screen.
For each database, only one user can use these screens to specify settings,
and this user will lock the settings.
- The Correlation Detection Alert Results screen shows a list of combinations
of highly-correlated keywords and categories that were extracted in accordance
with the entry settings. From each of these items, it is possible to connect
to the Docs view of relevant documents in Text Miner.
1.4 Database Selection
In the Select Database screen, specify a database to configure alert settings
for. For online help, click the  button located to the right of the screen title.
Select Database screen:
1.5 Analysis Type Selection
In the Analysis Type Selection screen, select an extraction type to be
used in the specified database, either increase detection or correlation
detection. The following figure shows the Analysis Type Selection screen
in the CALL_CENTER database.
Analysis Type Selection screen:
- Lock:
Click either Increase detection or Correlation detection to lock the settings
(mark the settings so that no other users can edit the settings) and start
editing the settings. When the settings are locked by another user, the
message "Locked (edited) by user_name" is displayed in the status
field. When an unregistered user is editing, the message "Locked (edited)
by anonymous_user" is displayed.
- When the previous user exits edit mode while the settings are still locked:
If a user closes and exits the browser from the screen marked with a dotted
line in after specifying detection settings, the settings will stay locked by
that user. When this occurs, edit the settings by using the interrupt edit
function.
- Interrupt edit:
Even if the status field does not show that the settings are "Editable,"
clicking the link for the extraction type will present a dialog that asks
whether you want to interrupt the edit. Click OK to cancel the settings
currently being edited by another user and start editing.
Interrupt edit confirmation dialog:
2 Increase Detection
2.1 Increase Detection
In increase detection, keywords are detected from the maximum of 20,000
keywords and subcategories in accordance with the ranking of "increase
indicators within the latest time period" (ranking of keywords with
the increased frequency of appearance on the latest date) based on the
Delta view of Text Miner. Because the keywords and subcategories detected
in increase detection are likely to increase the frequency of appearance
in the future, examination of these keywords and subcategories can lead
to early detection of problems. In the increase detection settings, specify
a "vertical category" for the Delta view, and analyze the frequency
of appearance of the keywords belonging to that category and its subcategories
(including the ones that are more than two levels below) in Time Series.
Text Miner Delta view:
2.2 Increase Detection Entry List
- Increase detection entry:
In increase detection, a keyword group and subcategory group are specified
from a particular category as the target of detection, and this category
and a set of parameter settings associated with detection are collectively
managed as an entry.
Increase detection entry = category + parameter group
For example, when "Noun" is specified as a category and the number
of keywords to be analyzed is set to 10,000 as a parameter, keywords that
are increasingly used recently can be detected from 10,000 keywords that
are nouns. The "Noun" category, with parameters such as the number
of keywords to be analyzed being set to 10,000 keywords, is the "Word
(weekly)" entry.
- Entry operation:
In the Increase Detection Entry List screen, you can use the New Entry
button and Delete link to add and delete entries, respectively. Although
a new entry to be added to the list needs an entry name to be displayed
in the list, you can change the name when you specify parameter settings
later on.
- Edit:
You can edit categories by clicking the Category link, and you can edit
parameters by clicking the Parameters link.
2.3 Increase Detection Edit Category
2.4 Increase Detection Edit Parameters
Set the analysis parameters in the Increase Detection Edit Parameters screen.
The definition of each parameter can be checked by clicking  .
- Saving the parameter settings:
After editing the parameters by using the text field or radio buttons,
click the Save button to save the changes and return to the Increase Detection
Entry List screen.
- Canceling the parameter changes:
After editing the parameters by using the text field or radio buttons,
click the Cancel button to cancel the parameter changes and return to the
Increase Detection Entry List screen.
- Entry name:
Specify the names of the entries to be listed in the Increase Detection
Entry List screen.
- Category:
Categories set in category edit mode are displayed. Categories cannot be
edited in this screen. In increase detection, over-time changes in keywords
and subcategories of the specified category are analyzed to detect the
ones in which the rate of increase in frequency of use has been notably
rising.
- Maximum number of alerts:
Specify the maximum number of alerts to be detected. Keywords or subcategories
whose frequency of use is notably increasing are detected as alerts. Detection
is not based on a threshold, but instead, the keywords or subcategories
whose increase indicator is ranked within the top N are detected. For example, if you set the maximum number of alerts to
20, the top 20 keywords or subcategories in which the rate of increase
in frequency of use has been notably rising will be returned.
- Maximum number of keywords:
Specify the maximum number of keywords or subcategories to be analyzed.
For example, when the number of keywords to be analyzed is set to 5,000,
the keywords and subcategories with a high rate of increase in frequency
of use are detected among 5,000 keywords and subcategories.
- Minimum frequency:
Although a collection of documents that is retrieved under the conditions
described above is attached to each alert, alerts with only a few documents
might be meaningless; therefore, specify the minimum number of documents
required to narrow down the alerts to be detected. Alerts will not be returned
if the number of applicable documents is below the set value.
- Decaying factor:
In increase detection, keywords and subcategories that "are relatively
more frequently used recently than the past average frequency" are
detected, and this parameter specifies how old the data should be when
obtaining the past average frequency. The past value weighs more as the
value becomes larger. For example, when the decaying factor is 0.8 and
the time scale is set to "month," the frequency obtained two
months ago weighs 0.8 times more than the frequency obtained in the previous
month when calculating the average.
- Time scale:
From monthly, weekly, and daily, select one for acquisition of frequency
data when analyzing the time-series frequency of use of keywords and subcategories.
Select a long time scale if you wish to analyze a slow increase in frequency
or to analyze keywords that are not used frequently. Select a short time
scale, however, if you wish to analyze a short-term increase in frequency
or to analyze keywords that are used moderately or quite frequently.
2.5 Increase Detection Alert Results
Alert result display method:
After entries for increase detection are set, a View link is generated
inside the Result field in the Increase Detection Entry List screen for
the entries for which alerts are extracted as a result of the batch processing
(the timing of the processing depends on the operation). Click the View
link to see the list of alert results.
Link from a batch-processed entry:
Alert result:
- Settings that are effective when the batch processing is run are shown
in the top-left table.
- The processing status is shown in the top-right table.
- Keywords and subcategories for which an increase in frequency of use was
detected are shown in each line of the bottom table, and these keywords
and subcategories are arranged in descending order of the increase indicator
value.
- Categories selected for a particular entry are shown in the Noun field
in the previous figure. Extracted keywords and subcategories are shown
in each line, and an identifier such as "XXX (keyword)" and "XXX
(category)" is shown for each of them.
- The Frequency field shows the frequency of keyword use on the latest date
of time-series analysis. For example, when the time scale is set for weekly
analysis, this field shows the frequency of use of the keyword in the corresponding
line for the latest week.
- The Increase indicator field shows the index to describe how much the frequency
of use of that keyword has increased.
- The Jump field is linked to Text Miner, and clicking it shows the Time
Series view, which is generated based on the keyword in that line.
3 Correlation Detection
3.1 Correlation Detection
In correlation detection, combinations of keywords and subcategories that
are highly correlated to each other are extracted. Analysis by the correlation
detection function is almost the same level as the analysis carried out
in the Text Miner two-dimensional (2D) map with the largest "Max lines
to display" for both the vertical and horizontal axes, and it examines
correlations for the number of items that are equivalent to 1,000 x 12,000
cells at maximum.
Text Miner 2D map view:
In addition to the scale, differences between correlation detection and
the 2D map are as follows:
- Only the keywords and subcategories immediately under the categories arranged
along the vertical and horizontal axes are analyzed in the 2D map, but
in correlation detection, subcategories that are more than two levels down
in the category hierarchy can be analyzed. For this reason, correlations
in intermediate levels in the hierarchy can be analyzed by defining the
product classification levels in terms of categories. For example, assume
that there are product classifications as shown below:
Category tree |
Product
|
T-shirt
|
ABC-T-shirt
|
XYZ-T-shirt
|
XYZ-T-shirt L
|
XYZ-T-shirt M
|
XYZ-T-shirt S
|
Jacket
|
Shoes
|
|
If there is a strong correlation between "XYZ-T-shirt L" and
the phrase "size … large," the size indication for only the large
size products has a problem. If, however, there is a strong correlation
between "XYZ-T-shirt" and "size … large," the overall
size indication system for the XYZ brand might have a problem.
- In the 2D map, the vertical axis and the horizontal axis are equal to each
other, and the keywords and subcategories of the category specified as
the horizontal category are arranged along the horizontal axis. In correlation
detection, however, the analysis axes are divided into the compared category and the content category. Keywords and subcategories in the compared category will be compared
from the perspective of keywords and subcategories in the content category.
Also, up to two categories can be specified as the content categories,
and on the 2D map horizontal axis, not only independent keywords and subcategories
but also pairs of these elements can be analyzed. See the following figure
for the detailed image.
When a correlation analysis is visualized using the 2D map:
Compared category: product
First content category: noun phrase
Second content category: purchase history
3.2 Correlation Detection Entry List
- Correlation detection entry:
In correlation detection, a set of compared category, content category,
and parameters associated with detection is managed as an entry. Refer
to for a description of compared categories and content categories.
Correlation detection entry = compared category + content
category + parameter group
- Entry operation:
In the Correlation Detection Entry List screen, you can use the New Entry
button and Delete link to add and delete entries, respectively. Although
a new entry to be added to the list needs an entry name to be displayed
in the list, you can change the name when you specify parameter settings
later on.
- Edit:
You can edit categories (compared categories and content categories) by
clicking the Category link, and you can edit parameters by clicking the
Parameters link.
3.3 Correlation Detection Edit Category
3.4 Correlation Detection Edit Parameters
Set the analysis parameters in the correlation detection edit parameters
screen. The definition of each parameter can be checked by clicking  .
- Saving the parameter settings:
After editing the parameters by using the text field or radio buttons,
click the Save button to save the changes and return to the Correlation
Detection Entry List screen.
- Canceling the parameter changes:
After editing the parameters by using the text field or radio buttons,
click the Cancel button to cancel the parameter changes and return to the
Correlation Detection Entry List screen.
- Entry name:
Specify the names of the entries to be listed in the Correlation Detection
Entry List screen.
- Compared category:
Compared categories set in category edit mode are displayed. Compared categories
cannot be edited in this screen. Refer to for a description of compared categories.
- Content category:
Content categories set in category edit mode are displayed. Content categories
cannot be edited in this screen. Refer to for a description of content categories.
- Maximum number of alerts:
Specify the maximum number of alerts to be detected. Alerts are detected
as combinations of [keyword or subcategory of the compared category, keyword
or subcategory (or two subcategories) of the content category], and returned
in the order of strength of correlation between the compared category and
the content category up to the number specified in this setting. For example,
if you set the maximum number of alerts to 20, the top 20 combinations
of highly-correlated keywords and subcategories will be returned.
Meaning of the parameters when a correlation analysis is visualized using
the 2D map:
(1): number of compared category keywords
(2): number of content category keywords
(3): number of pairs of content category keywords
- Maximum number of keywords in compared category:
Set the maximum number of compared category keywords or subcategories to
be used in the analysis. For example, when the Model category is set as
the compared category and at the same time the number of compared category
keywords is set to 200, the 200 most frequently mentioned models will be
the subject of the analysis.
- Maximum number of keywords in content category:
Set the maximum number of content category keywords or subcategories to
be used in the analysis (up to two content categories can be specified).
For example, when the Bad Reputation and Part Name categories are set as
the content categories and at the same time the number of content category
keywords is set to 200, up to 200 types of bad reputations and up to 200
part names will be the subject of the analysis.
- Maximum number of keyword pairs in content category:
When two content categories are specified, set the maximum number of pairs
consisting of the keywords and subcategories of the first content category
and the keywords and subcategories of the second content category. For
example, when the number of pairs of content category keywords is set to
10,000 while Bad Reputation and Part Name are specified as the content
categories, up to 10,000 pairs consisting of [expression of bad reputation,
part name] will be the subject of the analysis, and an alert will be returned
if a strong correlation is detected either in the compared category keywords
and subcategories or in the 10,000 pairs.
- Minimum frequency:
Although a collection of documents that is retrieved under the conditions
described above is attached to each alert, alerts with only a few documents
might be meaningless; therefore, specify the minimum number of documents
required to narrow down the alerts to be detected. Alerts will not be returned
if the number of applicable documents is below the set value.
- Confidence coefficient:
This is a parameter used when statistically calculating a correlation value
(correlation strength). If the coefficient is set high, a relatively large
number of alerts having the sufficient number of applicable documents for
calculating correlation strength will be returned, and if it is set low,
a large number of alerts with the slightest possibilities will be returned
even if there is not a sufficient number of applicable documents for calculating
correlation strength.
3.5 Correlation Detection Alert Results
Alert result display method:
After entries for correlation detection are set, a View link is generated
inside the Result field in the entry list screen for the entries for which
alerts are extracted as a result of the batch processing (the timing of
the processing depends on the operation). Click the View link to see the
list of alert results.
Link from a batch-processed entry:
Alert result:
- Settings that are effective when the batch processing is run are shown
in the top-left table.
- The processing status is shown in the top-right table.
- Pairs or triads of keywords and subcategories for which a correlation was
detected are shown in each line of the bottom table, and these keywords
and subcategories are arranged in descending order of the correlation indicator
value.
- Categories selected for a particular entry are shown in the "Software,"
"Technical term," and "Command" fields in the previous
figure. If only one content category is specified, a compared category
and a content category are shown in this field. Extracted keywords and
subcategories are shown in each line, and an identifier such as "XXX
(keyword)" and "XXX (category)" is shown for each of them.
There are three display fields (two content categories are specified),
and if an extracted item is a pair, then the remaining field will be left
blank.
- The Frequency field shows the number of documents containing all the extracted
keywords and subcategories.
- The Correlation indicator field shows the index to describe the level of
correlation between the content category items and compared category item
(keyword or subcategory) in that line.
- The Jump field is linked to Text Miner, and clicking it shows the Docs
view, narrowed down by all the keywords or subcategories in that line.
4. Batch Processing
4.1 Increase Detection Batch Processing
This section describes the increase detection batch processing required
for the operation. Skip this section if the alert settings are made via
the user interface. When the increase detection command is run on the server
machine, a report file is generated as the analysis result within the directory
for the specified database:
Database directory\alerting\batch\increase_detection_report.xml
Determine how to use the report file in accordance with the operation.
In the increase detection batch processing, the
%TAKMI_HOME%\bin\takmi_alert_increase.bat
command is used.
Prior conditions:
The language processing and index creation needs to be completed within
the database where increase detection is to be run. In addition, the Alerting
System Web application must not be in operation.
Command syntax:
takmi_alert_increase.bat DATABASE_NAME MAXIMUM_ANALYSIS_TIME_BY_MINUTE
JAVA_HEAP_SIZE_BY_MEGA_BYTES
- DATABASE_NAME:
The database name defined in global_config/database_entries/database_entry@name
of global_config.xml.
- MAXIMUM_ANALYSIS_TIME_BY_MINUTE:
Specify the maximum analysis time in units of "minutes."
- JAVA_HEAP_SIZE_BY_MEGA_BYTES:
Specify the Java™ heap size for analysis in units of "megabytes."
4.2 Correlation Detection Batch Processing
This section describes the correlation detection batch processing required
for the operation. Skip this section if the alert settings are made via
the user interface. When the correlation detection command is run on the
server machine, a report file is generated as the analysis result within
the directory for the specified database:
Database directory\alerting\batch\correlation_detection_report.xml
Determine how to use the report file in accordance with the operation.
In the correlation detection batch processing, the
%TAKMI_HOME%\bin\takmi_alert_correlation.bat
command is used.
Prior conditions:
The language processing and index creation needs to be completed within
the database where correlation detection is to be run. In addition, the
Alerting System Web application must not be in operation.
Command syntax:
takmi_alert_correlation.bat DATABASE_NAME MAXIMUM_ANALYSIS_TIME_BY_MINUTE
JAVA_HEAP_SIZE_BY_MEGA_BYTES
- DATABASE_NAME:
The database name defined in global_config/database_entries/database_entry@name
of global_config.xml.
- MAXIMUM_ANALYSIS_TIME_BY_MINUTE:
Specify the maximum analysis time in units of "minutes."
- JAVA_HEAP_SIZE_BY_MEGA_BYTES:
Specify the Java heap size for analysis in units of "megabytes."
Terms of Use
Notices
This information was developed for products and services offered in the
U.S.A.
IBM may not offer the products, services, or features discussed in this
document in other countries. Consult your local IBM representative for
information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to
state or imply that only that IBM product, program, or service may be used.
Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However,
it is the user's responsibility to evaluate and verify the operation of
any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not grant
you any license to these patents. You can send license inquiries, in writing,
to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact
the IBM Intellectual Property Department in your country or send inquiries,
in writing, to:
IBM World Trade Asia Corporation
Licensing
2-31 Roppongi 3-chome, Minato-ku
Tokyo 106-0032, Japan
The following paragraph does not apply to the United Kingdom or any other
country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION
"AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not
allow disclaimer of express or implied warranties in certain transactions,
therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical
errors. Changes are periodically made to the information herein; these
changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s)
described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for
convenience only and do not in any manner serve as an endorsement of those
Web sites. The materials at those Web sites are not part of the materials
for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way
it believes appropriate without incurring any obligation to you.
Licensees of this program who wish to have information about it for the
purpose of enabling: (i) the exchange of information between independently
created programs and other programs (including this one) and (ii) the mutual
use of the information which has been exchanged, should contact:
IBM Corporation
Silicon Valley Lab
Building 090/H-410
555 Bailey Avenue
San Jose, CA 95141-1003
U.S.A.
Such information may be available, subject to appropriate terms and conditions,
including in some cases, payment of a fee.
The licensed program described in this document and all licensed material
available for it are provided by IBM under terms of the IBM Customer Agreement,
IBM International Program License Agreement or any equivalent agreement
between us.
Information concerning non-IBM products was obtained from the suppliers
of those products, their published announcements or other publicly available
sources. IBM has not tested those products and cannot confirm the accuracy
of performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to
the suppliers of those products.
All statements regarding IBM's future direction or intent are subject to
change or withdrawal without notice, and represent goals and objectives
only.
This information contains examples of data and reports used in daily business
operations. To illustrate them as completely as possible, the examples
include the names of individuals, companies, brands, and products. All
of these names are fictitious and any similarity to the names and addresses
used by an actual business enterprise is entirely coincidental.
Copyright License
This information contains sample application programs in source language,
which illustrate programming techniques on various operating platforms.
You may copy, modify, and distribute these sample programs in any form
without payment to IBM, for the purposes of developing, using, marketing
or distributing application programs conforming to the application programming
interface for the operating platform for which the sample programs are
written. These examples have not been thoroughly tested under all conditions.
IBM, therefore, cannot guarantee or imply reliability, serviceability,
or function of these programs.
Trademarks
This topic lists IBM trademarks and certain non-IBM trademarks.
See for information about IBM trademarks.
The following terms are trademarks or registered trademarks of other companies:
Java and all Java-based trademarks and logos are trademarks or registered
trademarks of Sun Microsystems, Inc. in the United States, other countries,
or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of
Microsoft Corporation in the United States, other countries, or both.
Intel, Intel Inside (logos), MMX and Pentium are trademarks of Intel Corporation
in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and
other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries,
or both.
Other company, product or service names might be trademarks or service
marks of others.
|