A fast information retrieval system does not sequentially scan through text documents; this would take too long. Instead, it operates on a previously built text index. You can think of a text index as consisting of significant terms extracted from the text documents, each term stored together with information about the document that contains it.
A text index contains only relevant information; insignificant words, such as "and", "of", and "which", are not indexed. Text Extender uses a list of these words, known as stop words to prevent them from being indexed. The retrieval system searches through the index for the terms requested to find which text documents contain those terms.
Tip |
---|
If you need to modify the list of stop words, do it only once, and at installation time. |
A list of stop words per language is stored in a file that you can modify (see Modifying the stop-word and abbreviation files), but, because there is one file for the whole system, you should change it only once while you are setting up Text Extender for the first time. If you change the file later, existing indexes will not reflect the change.
As an example, let's say that some documents contain the name of a weekly magazine called "Now". If you remove this word from the stop words, it will be indexed and can be found by future searches. However, any indexes created before you removed the stop word will not contain the word "now", and a search for it will be unsuccessful.
If you do decide to change the stop words, and you want this change to be reflected throughout, you must recreate all your indexes.
Indexing is a two-step process. The first step is to record in a log table the text documents that need to be indexed. This occurs automatically through DB2 triggers whenever you insert, update, or delete a text document in a column.
The second step is to index the text documents listed in the log table. This may be done periodically. The terms of those documents that were inserted or changed in the column are added to the index. The terms of those documents that were deleted from the column are removed from the index.
Figure 4. Indexing only significant terms
![]() |