IBM Content Analyzer Dictionary Editor Guide
Edition Notice
This edition applies to version 8, release 4 of IBM Content Analyzer and
to all subsequent releases and modifications until otherwise indicated
in new editions.
This document contains proprietary information of IBM. This proprietary
information is provided in accordance with the license conditions and is
protected by copyright. Information contained in this document provides
no warranties whatsoever for any products. Also, no descriptions provided
in this document should be interpreted as product warranties. Depending
on the system environment, the yen symbol may be displayed as the backslash
symbol, or the backslash symbol may be displayed as the yen symbol.
© Copyright International Business Machines Corporation 2007, 2008. All
rights reserved.
US Government Users Restricted Rights - Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corp.
1 Introduction
This document describes how to use the IBM Content Analyzer Dictionary
Editor application.
1.1 Functional Overview
The Dictionary Editor is a Web application that you can use to edit the
following items. See the for definitions of terms such as category, keyword, and synonyms.
- Edit the category tree:
Add or delete categories.
- Edit keywords:
Add or delete keywords, or register them in a category.
- Edit synonyms:
Add synonyms to keywords or delete synonyms from keywords; also, select
synonyms to be used as keywords.
The following figure shows the relationship between editing a dictionary
with Dictionary Editor and analysis by Text Miner:
1.2 Dictionary Resource Files
The Dictionary Editor supports editing operations by multiple users. To
avoid editing conflicts, Dictionary Editor includes a mechanism to lock
the files to be edited and prevent other users from editing the same files.
Smooth operation can be ensured if users know which files might cause conflicts
when they edit them. A description of each file type is as follows:
- Category tree file:
The entire category tree is saved as one file, and you use the Edit category
tree screen to edit it. Because editing of the category tree and editing
of keywords conflict with each other, other users cannot edit keywords
while you edit the category tree. At the same time, you cannot edit the
category tree while another user is editing keywords.
- Keyword file:
A keyword file saves a list of keywords and synonyms with their category
information. Only one user can edit a keyword file at a time. Therefore,
it is recommended to create a keyword file for each operator to avoid conflicts.
Usually, keyword files are created for each category, such as a product
name dictionary, a service name dictionary, and so on. Operators are also
divided into categories. Note that you cannot edit the category tree and
keyword files at the same time because they conflict with each other.
- Candidate word file:
A candidate word file loads frequently used character strings that have
been extracted, or lists of product names or service names retrieved from
internal databases, into the IBM Content Analyzer dictionary. Multiple
users can use this file simultaneously because this file is a read-only
file.
1.3 Page Transition
Dictionary Editor provides the following screens for editing the category
tree, keywords, and synonyms.
- In the Select Database screen, select a database to be edited.
- Menu items are always displayed in the left frame, but edit menu items
are not available until a database is selected.
- In the Configuration screen, set parameters for keyword edit.
- Add or delete categories in the Edit category tree screen.
- In the Select keyword file dialog, specify a keyword file to be edited.
You can also specify a candidate word file, if it exists.
- In the Edit keyword candidate mode screen, add or delete keywords. In this
mode, you can add new keywords by entering the words. You can also add
candidate words in the candidate word file as keywords.
- In the Edit keyword category tree mode screen, in addition to adding or
deleting keywords, you can register keywords in categories or delete keywords
from categories. The category tree is displayed in this mode, and keywords
can be searched for each currently registered category.
- In the Edit synonyms screen, add or delete synonyms to keywords. When synonyms
are already set, you can use a particular synonym as a keyword, and use
the currently used keyword as its synonym.
2 Before Editing the Dictionary
2.1 Initial Screen and Database Selection
In the initial state, a database has not been selected and among the menu
items listed in the left side of the screen, only "Select Database"
and "Help" are active.
Select a database in the initial screen:
- Click Select Database under Menu.
- Select the database that is to be used in the dictionary edit operation
from the list.
- Click OK.
2.2 After Selecting a Database
The following screen is shown after you select a database.
After selecting a database:
- A message saying that the selected database has been loaded is displayed.
- The selected database is shown in the Current Database area.
- Configuration, Edit Category Tree, Edit Keywords and Edit Rules become active.
2.3 Editing the Settings
To change the keyword edit settings, click Configuration under Menu.
Configuration screen:
- Click Configuration under Menu to open the Configuration screen.
- Use the select box to specify the number of keywords to be displayed in
a page.
- Click the Save button to save the change.
Note: The value set here becomes the maximum number of lines in the candidate
word list and registered keyword list in in the Edit keyword screen.
2.4 After Editing the Settings
The following screen is shown after you edit the settings.
After editing the settings:
A message to confirm that the new settings have been saved is displayed.
3 Editing the Category Tree
3.1 How to Start
To edit the category tree, click Edit Category Tree from the Menu.
How to start editing the category tree:
- Click Edit Category Tree under Menu.
- Currently registered categories are displayed.
3.2 Warning when Editing the Category Tree
When multiple users are editing the dictionary, you must ensure that no
other users are using Dictionary Editor when you edit the category tree.
The following warning message appears if you try to edit the category tree
while another user is editing the category tree or keywords.
Category tree edit interrupt warning:
Click the OK button to interrupt the edit operation. Unsaved data that
is being edited by another user will be discarded. The same warning message
is also displayed if you or a user who edited the category tree or keywords
immediately before you start editing closed the window without properly
completing the edit operation. Clicking OK to start editing will not affect
the user who already finished editing.
3.3 Adding a Category
Adding a category:
- To add a category, click Add Subcategory at the next hierarchy level.
- Type a category name in the dialog box.
- Click OK.
Notes:
- To create a category without a parent (root category), click Add Root Category
at the top hierarchy level.
- The category will not be added if you click Cancel in the dialog box.
3.4 Renaming a Category
Renaming a category:
- Click the Rename link to the right of the category name that you want to
change.
- Type a category name in the dialog box.
- Click OK.
Note: The category name will not be changed if you click Cancel in the
dialog box.
3.5 Deleting a Category
Deleting a category:
- Click the Delete link to the right of the category name that you want to
delete.
- Click OK in the confirmation dialog box.
Notes:
- The category will not be deleted if you click Cancel in the dialog box.
- When a category is deleted, keyword information registered in that category
will also be deleted (keywords will not be deleted). Registered keyword
information will not be restored even though the same category is recreated;
therefore, you must be careful when you delete a category.
3.6 Saving and Exiting Edit Mode
After editing the category tree, you must run the termination processing
regardless of whether or not changes such as adding or deleting categories
have been made, or whether or not changes must be saved. If the screen
is closed while the edit operation continues, the category file () is locked, and other users must interrupt when they want to edit the
category tree or keywords.
Category save/quit menu:
(1) Save the current changes and continue the operation. The file stays
locked; therefore, termination processing (2) or (3) is necessary.
(2) Save the changes and exit the category tree edit mode. The file will
be unlocked.
(3) Exit the category tree edit mode without saving changes. The file will
be unlocked.
The following screen is shown after you save and exit the edit mode:
(1) The termination message appears.
(2) The file is unlocked, and Edit Category Tree, Edit Keywords and Edit Rules become
active again.
3.7 Automatically Generated Dependency Categories
When the category tree is edited, categories for dependency keywords are
automatically created. These categories cannot be seen while using Dictionary
Editor, but they can be used with Text Miner.
Dictionary Editor category tree |
Product
|
Hardware
|
Software
|
|
↓
Text Miner category tree |
Product
|
Hardware
|
Dependency
|
Hardware .. bad reputation
|
Hardware .. verbs
|
Hardware .. problem
|
Hardware .. good reputation
|
Hardware .. senses
|
Hardware .. requests
|
Hardware .. questions
|
Software
|
Dependency
|
Software .. bad reputation
|
Software .. verb
|
Software .. problem
|
Software .. good reputation
|
Software .. senses
|
Software .. requests
|
Software .. questions
|
Dependency
|
Product .. bad reputation
|
Product .. verb
|
Product .. problem
|
Product .. good reputation
|
Product .. senses
|
Product .. requests
|
Product .. questions
|
|
In this example, the "dependency" category is added immediately
below the "product," "hardware," and "software"
categories, and below that, categories to show phrases using various types
of declinable words are added. Dependency expressions belonging to these
dependency categories are phrases consisting of keywords registered in
individual categories and indeclinable words, in the same manner as the
basic dependency categories described in . Note, however, that the "dependency" category immediately below
the "product" category is only for phrases consisting of keywords
that belong to the "product" category and various indeclinable
words; dependency involving the "hardware" and "software"
categories is not included.
4 Editing Keywords
4.1 How to Start
To edit keywords, click Edit Keywords from the Menu.
How to start editing keywords:
Click Edit Keywords to open the keyword file selection dialog.
- Select the check box if you want to use candidate word files ().
- Specify the keyword file to be edited ().
- Click OK. The dialog closes and the candidate words display mode of the
edit keyword screen starts.
Hint: To create a new keyword file, select New File, type a file name without
an extension in the text field, and then click OK.
4.2 Warning when Editing Keywords
If another user is editing the category tree, a pop-up warning message
is displayed.
Keyword edit interrupt warning:
Click the OK button to interrupt the edit operation. Unsaved category tree
data that is being edited by another user will be discarded. The same warning
message will be displayed if you or a user who edited the category tree
immediately before you start editing closed the window without properly
completing the edit operation. Clicking OK to start editing will not affect
the user who already finished editing.
4.3 Keyword Files Currently Being Edited
When selecting a keyword file in the keyword file selection dialog box,
if another user is editing the keyword file, the message "Used by
another user" appears on the right side of the file currently being
edited.
Keyword file selection dialog while the keyword file is being edited:
Select the keyword file that is being edited and then click OK to open
a dialog to confirm interrupting the edit. Click OK again. Unsaved edit
data created by another user will be discarded.
4.4 Candidate Words Display Mode
The structure of the Edit keyword screen in the candidate words display
mode is as follows.
Edit keyword screen in the candidate words display mode:
(1) Search/Sort menu: use this area to narrow down or sort candidate words
or keywords that are displayed in (3) and (4). Operations in this area
will be reflected in both lists at the same time.
(2) Display mode select box: use this select box to switch the display
modes between the candidate words display mode and category tree display
mode.
(3) Candidate word list: a list of candidate words will be displayed when
two or more candidate files are selected in the keyword file selection
dialog ().
(4) Registered keyword list: this is a list of keywords that are already
registered in the keyword file (). Keywords can be registered while comparing between the candidate word
list and the registered keyword list.
(5) Add and Delete: use these arrows to add or delete keywords. Use the
right arrow to add keywords and use the left arrow to delete keywords.
(6) Save buttons: Use these buttons to save keyword edit information and
exit the edit mode.
4.5 Search and Sort
The candidate word list and the registered keyword list can be further
narrowed or sorted by using the word type filter (search), string match
(search), and sort functions.
Search/Sort menu:
- Word type filter: Narrow the set of candidate words and keywords by type.
- Cancel: Disable the word type filter.
- Hiragana: List candidate words and keywords consisting only of Japanese
hiragana.
- Katakana: List candidate words and keywords consisting only of Japanese
katakana.
- Alphanumeric characters: List candidate words and keywords consisting only
of alphanumeric characters.
- String match: Type a character string in the text field and click the Search
button to search for candidate words and keywords that use the specified
character string.
- Cancel: Cancel search.
- Partial: Search candidate words and keywords that contain the specified
character string.
- Exact: Search candidate words and keywords that are exactly the same as
the input character string. Use this option to see if a particular keyword
is already registered.
- Prefix: Search candidate words and keywords that start with the specified
character string.
- Suffix: Search candidate words and keywords that end with the specified
character string.
- Sort: Specify the order in which candidate words and keywords are displayed.
- Frequency: If the frequency of appearance of candidate words in documents
is included as data when the candidate word file was created, the candidate
words are listed in order of frequency of appearance. Character strings
that were used frequently are believed to be highly effective when they
are used as keywords.
- Confidence score: If the confidence score (to indicate the possibility
that individual candidate words can be useful as keywords) is included
as data when the candidate word file was created, the candidate words are
listed in descending order of score.
- Alphabetical: Candidate words and keywords are listed in order of the world
standard character code called Unicode. In Unicode, characters, letters,
and numbers are arranged in the following order: Japanese hiragana, Japanese
katakana, Chinese character, numbers, uppercase alphabet, and lower case
alphabet. In Japanese hiragana and katakana, characters are listed in the
order of the Japanese syllabary; however, lowercase characters (such as
the small "a" sound in Japanese) come before the regular character,
and characters with the voiced sound symbol come after the regular character.
Chinese characters are sorted and listed by radical, which is basically
in the same manner as in a kanji (Chinese character) dictionary.
- Modification time: Candidate words and keywords are listed in order of
date of modification such as the addition to or deletion from keywords,
editing of categories, and editing of synonyms. The newly modified keywords
are shown at the top of the list for easier operation.
4.6 Adding Keywords from the List of Candidate Words
To use particular candidate words as keywords, follow the procedures below.
Adding candidate words as keywords:
- Select the check box of the candidate word to be added. You can select
multiple candidate words at the same time.
- Click the arrow button to add the candidate words.
The newly added keywords are shown and highlighted at the top of the Registered
keyword list.
After adding candidate words as keywords:
Hint: Clicking the Select All button above the Candidate word list selects all
of the check boxes. This is a useful function when you want to add many
keywords at the same time. After you click the Select All button, this
button changes into the Cancel All button, and clicking this button will
clear all of the selected check boxes.
4.7 Adding New Keywords by Entering Character Strings
To add keywords by directly entering character strings instead of selecting
them from the candidate word list, follow the procedures below.
Adding candidate words as keywords:
- Click the New Keyword button of the Registered keyword list.
- When the dialog box appears, type the keyword.
- Click OK.
Notes:
- The newly added keyword is shown and highlighted at the top of the Registered
keyword list.
- If the keyword that you typed in the dialog box has already been registered,
the keyword will not be added but the already registered keyword will be
displayed and highlighted at the top of the Registered keyword list.
- If you type a synonym for an already registered keyword in the dialog box,
it is added as a new keyword.
- The keyword that you type will not be added if you click the Cancel button
in the dialog box.
4.8 Deleting Keywords
To delete keywords, follow the procedures below.
Deleting candidate words as keywords:
- Select the check box of the keyword in the Registered keyword list that
you want to delete. You can select multiple keywords simultaneously.
- Click the arrow button for deleting keywords.
Notes:
- If the deleted keyword is not a synonym of a different keyword, the deleted
keyword is shown and highlighted at the top of the Candidate word list.
- If the deleted keyword exists as a synonym of a different keyword, the
deleted keyword will not be shown on the Candidate word list.
- Clicking the Select All button above the Registered word list checks all
of the check boxes. This is a useful function when you want to delete many
keywords at the same time. After you click the Select All button, this
button changes into the Cancel All button, and clicking this button will
clear all of the selected check boxes.
4.9 Editing Synonyms
To edit synonyms of a particular keyword, click the Edit button to the
right of the keyword that you want to edit in the Registered keyword list.
Edit synonyms:
Click the Edit button to open the Edit synonym screen.
Edit synonym screen:
(1) Use these radio buttons to select a synonym to be regarded as a keyword
(standard form). If the identical character string has been registered
as a synonym of a different keyword, that character string (candidate for
a synonym) will appear on a different screen, and the radio buttons operate
accordingly. The system operates this way in order to let users know that
a word that is registered as a synonym of a different keyword can be separately
registered as a keyword.
(2) Use these check boxes to select which words are to be used as synonyms.
The check box will be automatically checked for the one with the radio
button checked in the Keyword column.
(3) This area shows choices (candidates) for the keyword and synonyms.
(4) This area shows types of synonym candidates. The meaning of each type
is as follows:
Type |
Meaning |
Current keyword |
A keyword for which the Edit button is clicked in the Edit keyword screen. |
Current synonym |
A synonym that is currently registered as a synonym of the current keyword. |
Unused synonym candidate |
Among the synonym candidates for the current keyword and the current synonym,
a word that is currently registered as a separate keyword or a keyword
to which a word registered as a separate synonym belongs. |
Aforementioned synonym |
A word that is registered in the candidate word file as a synonym candidate,
or, a newly added synonym which is currently registered as a synonym of
a different keyword. |
Unregistered synonym |
A word that is registered in the candidate word file as a synonym candidate,
or a newly added synonym which is not yet registered as a keyword or synonym. |
(5) Click OK to apply the synonym settings to the Edit keyword screen.
The keyword file is not yet saved when you click OK. To save the keyword
file, you must save it in the Edit keyword screen.
Edit keyword screen after editing synonyms:
In the Edit keyword screen, synonyms are listed to the right of the equal
sign.
4.10 Registering New Synonyms
To register new synonyms by entering character strings in the edit synonym
screen, follow the procedures below.
Adding a new synonym:
- Click the New Synonym button to open the dialog box for entering a synonym.
- Type a synonym in the dialog box.
- Click OK.
After adding a new synonym:
The specified character string is added as a synonym candidate with the
Synonym check box checked. Click the OK button at the bottom of the screen
to add it as a new synonym. The keyword file is not saved at this point;
therefore, it is necessary to save it in the Edit keyword screen.
4.11 Category Tree Display Mode
The structure of the Edit keyword screen in the category tree display mode
is as follows.
Edit keyword screen in the category tree display mode:
The difference between this mode and is that in this mode, the category tree is displayed instead of the candidate
word list.
4.12 Category Search
In the category tree display mode of the Edit keyword screen, you can specify
a category to search registered keywords.
Category search:
(1) When the category name is clicked, the message "Selected"
appears for that category.
(2) Keywords listed in the Registered keyword list are narrowed to the
keywords registered in the specified category. This function can be used
with other search or sorting functions.
(3) Click Reset Category Search" to cancel the search and restore
the original list.
4.13 Registering Keywords in a Category
To register a keyword in a category, follow the procedures below.
Registering a keyword in a category:
- In the Registered keyword list, select the check box of a keyword that
you want to register in a particular category. You can select multiple
keywords simultaneously.
- Click the Add button to the right of the category name to register the
selected keyword in that category.
After the keyword is registered, the category name appears in the Category
area in the Registered keyword table.
After registering a keyword in a category:
Click the Remove button on the lower right side of the category name to
cancel the category registration.
4.14 Saving and Exiting Edit Mode
After editing the keywords, you must run the termination processing regardless
of whether or not changes such as adding or deleting keywords have been
made, or whether or not changes must be saved. If the screen is closed
while the edit operation continues, the keyword file () is locked, and other users must interrupt when they want to edit the
category tree or keywords.
Keyword file save/quit menu:
(1) Save the current changes and continue the operation. The file stays
locked; therefore, the termination processing (2) or (3) is necessary.
(2) Save the changes and exit the keyword edit mode. The file will be unlocked.
(3) Exit the keyword edit mode without saving changes. The file will be
unlocked.
Screen shown after saving and exiting the edit mode:
(1) The termination message appears.
(2) The file is unlocked, and Edit Category Tree, Edit Keywords and Edit Rules become
active again.
Terms of Use
Notices
This information was developed for products and services offered in the
U.S.A.
IBM may not offer the products, services, or features discussed in this
document in other countries. Consult your local IBM representative for
information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to
state or imply that only that IBM product, program, or service may be used.
Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However,
it is the user's responsibility to evaluate and verify the operation of
any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not grant
you any license to these patents. You can send license inquiries, in writing,
to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact
the IBM Intellectual Property Department in your country or send inquiries,
in writing, to:
IBM World Trade Asia Corporation
Licensing
2-31 Roppongi 3-chome, Minato-ku
Tokyo 106-0032, Japan
The following paragraph does not apply to the United Kingdom or any other
country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION
"AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not
allow disclaimer of express or implied warranties in certain transactions,
therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical
errors. Changes are periodically made to the information herein; these
changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s)
described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for
convenience only and do not in any manner serve as an endorsement of those
Web sites. The materials at those Web sites are not part of the materials
for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way
it believes appropriate without incurring any obligation to you.
Licensees of this program who wish to have information about it for the
purpose of enabling: (i) the exchange of information between independently
created programs and other programs (including this one) and (ii) the mutual
use of the information which has been exchanged, should contact:
IBM Corporation
Silicon Valley Lab
Building 090/H-410
555 Bailey Avenue
San Jose, CA 95141-1003
U.S.A.
Such information may be available, subject to appropriate terms and conditions,
including in some cases, payment of a fee.
The licensed program described in this document and all licensed material
available for it are provided by IBM under terms of the IBM Customer Agreement,
IBM International Program License Agreement or any equivalent agreement
between us.
Information concerning non-IBM products was obtained from the suppliers
of those products, their published announcements or other publicly available
sources. IBM has not tested those products and cannot confirm the accuracy
of performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to
the suppliers of those products.
All statements regarding IBM's future direction or intent are subject to
change or withdrawal without notice, and represent goals and objectives
only.
This information contains examples of data and reports used in daily business
operations. To illustrate them as completely as possible, the examples
include the names of individuals, companies, brands, and products. All
of these names are fictitious and any similarity to the names and addresses
used by an actual business enterprise is entirely coincidental.
Copyright License
This information contains sample application programs in source language,
which illustrate programming techniques on various operating platforms.
You may copy, modify, and distribute these sample programs in any form
without payment to IBM, for the purposes of developing, using, marketing
or distributing application programs conforming to the application programming
interface for the operating platform for which the sample programs are
written. These examples have not been thoroughly tested under all conditions.
IBM, therefore, cannot guarantee or imply reliability, serviceability,
or function of these programs.
Trademarks
This topic lists IBM trademarks and certain non-IBM trademarks.
See for information about IBM trademarks.
The following terms are trademarks or registered trademarks of other companies:
Java and all Java-based trademarks and logos are trademarks or registered
trademarks of Sun Microsystems, Inc. in the United States, other countries,
or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of
Microsoft Corporation in the United States, other countries, or both.
Intel, Intel Inside (logos), MMX and Pentium are trademarks of Intel Corporation
in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and
other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries,
or both.
Other company, product or service names might be trademarks or service
marks of others.