This section describes how to prepare a database so that its text tables can be searched by Text Extender. The steps are:
If you have just installed Text Extender, do these steps in sequence.
Summary |
---|
|
When you create an index, the following parameters described in Text configuration settings, are set for that index:
When Text Extender is first installed, default values for these settings are established in the text configuration. To display the current text configuration values, see Displaying the text configuration settings.
To change the text configuration to be used as default values when indexes are created, enter:
db2tx "CHANGE TEXT CFG USING settings"
To change the default index type and the default index directory for future indexes:
db2tx "CHANGE TEXT CONFIGURATION USING INDEXTYPE precise INDEXOPTION normalized DIRECTORY DB2TX_INSTOWNER/db2tx/indexes"
To change the default update frequency for indexes so that they are updated at 12:00 or 15:00, on Monday to Friday, if there is a minimum of 100 text documents queued:
db2tx "CHANGE TEXT CONFIGURATION USING UPDATEFREQ min(100) d(1,2,3,4,5) h(12,15) m(00)"
To stop the periodic updating of an index:
db2tx "CHANGE TEXT CONFIGURATION USING UPDATEFREQ none"
Summary |
---|
|
There is one stop-word file and one abbreviation file per language. To understand the implications of editing these files, see Why text documents need to be indexed.
Tip |
---|
Before you begin editing one of these files, make a backup copy. |
The stop word and abbreviation files on OS/2 and Windows systems are in:
drive:\dmb\db2tx\dict
On AIX, HP-UX, and SUN-Solaris systems, they are in:
DB2TX_INSTOWNER /db2tx/dicts
Use your own editor to edit these files. They use CCSID 850, so ensure that your application CCSID is also set to 850 before you begin.
Remove words and abbreviations that you want to be indexed. Add words that you do not want to be indexed.
Summary |
---|
|
You can restrict a search to particular sections of documents. This concept, and the use of a document model file to enable Text Extender to recognise such sections, is described in Working with structured documents.
When a server instance is created, an example of a document model file desmodel.ini is created in the server instance directory. There is also an example of a document model for HTML documents in the same file.
Use your own editor to edit document model files.
Later, when you enable the text column that contains the documents, you must specify INDEXPROPERTY SECTIONS_ENABLED.
Summary |
---|
|
This section describes how to create table DB2TX.SAMPLE that is used in many of the examples in this chapter, and in many of the examples in Chapter 5, Searching with Text Extender UDFs.
Alternatively, you can run TXVERIFY and TXSAMPLE to automatically create a sample table to be enabled for each index type. See Preparing a sample database containing enabled text. If you do this, however, you can no longer use the DB2TX.SAMPLE table name as shown in the examples of administration commands to enable a text table and to enable a text column. If you decide not to use TXSAMPLE to automatically create the sample table, make and run a command script to do the following:
db2 "CREATE TABLE db2tx.sample ( docid VARCHAR(30), author VARCHAR(50), subject VARCHAR(100), date TIMESTAMP, comment LONG VARCHAR)"
db2 "import from \"dessamp.imp\" of del modified by DELPRIORITYCHAR insert into db2tx.sample"
The file DESSAMP.IMP is in the /samples subdirectory in the directory in which Text Extender is installed.
Text Extender offers utilities for preparing a sample database. They are useful for quickly preparing text for testing Text Extender's search capabilities immediately after installation, and for general test purposes at any time after that.
These utilities are in the following directories:
where DMBMMPATH is the installation path.
To prepare a sample database, either at the server or at a client workstation:
TXVERIFY database-name [user-id] [password]
This command creates and enables a sample database.
TXSAMPLE database -name [user-id] [password]
The user ID and password are required only if you are working from a client workstation.
This command does the following:
Summary |
---|
|
To enable the connected database , enter:
db2tx "ENABLE DATABASE"
This command takes no parameters. It prepares a database for use by Text Extender.
This command also declares user-defined functions (UDFs) and user-defined distinct types (UDTs) to DB2. These are the SQL functions that you use later to search for text. They are described in Chapter 9, UDTs and UDFs. These declarations apply to all future sessions.
A catalog view, TEXTINDEXES, is created that keeps track of enabled text columns. See Working with the Text Extender catalog view.
This command creates text configuration information for the database, containing default values for index, text, and processing characteristics. They are described in Text configuration settings.
![]() |
Once a database has been enabled, it remains so until you disable it. To reverse the changes made by ENABLE DATABASE, refer to Disabling a database.
Tips |
---|
If the environment variable DB2TX_INSTOWNER is used, it must be set to the name of the instance owner before the database is enabled. This is particularly important for UNIX users because, in UNIX, this variable is set by default. If you later decide to drop an enabled database, you should first disable it to ensure that the declared UDFs, the catalog view, and so on, are removed. |
Summary |
---|
|
This step determines whether you have one common index for all the text columns in the table, or several indexes, that is, a separate index for each text column. See Creating one or several text indexes for a table for further information.
To have a common index, run ENABLE TEXT TABLE, then run ENABLE TEXT COLUMN for each text column. To have separate indexes, skip ENABLE TEXT TABLE, and run only ENABLE TEXT COLUMN for each text column. This is illustrated later in this topic.
During this step, Text Extender creates an empty text index that is common to all subsequently enabled text columns. You specify the type of index, how frequently the index is to be updated, and in which directory the index is to be stored. Default values for any parameters that you do not specify are taken from the text configuration settings.
The examples in other chapters assume that the index type is dual.
Tip |
---|
If a setting, such as the index update frequency, should be the same for most text tables, it may be more convenient to use text configuration information to specify default settings. See CHANGE TEXT CONFIGURATION. |
This step also creates an empty log table for recording which documents in the table are added, changed, or deleted. Triggers are created to keep the log table updated.
You cannot run ENABLE TEXT TABLE for a table that already contains a text column that has been enabled for Text Extender.
To delete an index created by ENABLE TEXT TABLE, see Disabling a text table.
Tip |
---|
If you later decide to drop an enabled text table, you should first disable it to ensure that the index, the log table, and so on, are removed. |
The following example enables text table DB2TX.SAMPLE:
db2tx "ENABLE TEXT TABLE db2tx.sample"
Default values for the index characteristics are taken from the text configuration settings.
The next example explicitly sets the characteristics of the common index that is created for the table.
For a UNIX server:
db2tx "ENABLE TEXT TABLE db2tx.sample INDEXTYPE linguistic UPDATEFREQ min(100) d(1,2,3,4,5) h(12,15) m(00) DIRECTORY DB2TX_INSTOWNER/db2tx/indexes"
For an OS/2, or Windows NT server:
db2tx "ENABLE TEXT TABLE db2tx.sample INDEXTYPE linguistic UPDATEFREQ min(100) d(1,2,3,4,5) h(12,15) m(00) DIRECTORY \dmb\db2tx\indexes"
The example sets the index type and the index directory, and then sets the index update frequency so that the index is updated at 12:00 or 15:00, on Monday to Friday, if there is a minimum of 100 text documents queued.
Figure 8. Creating a common index for all text columns in a table
![]() |
Figure 9. Creating a separate index for each text column
![]() |
Summary |
---|
|
Tip |
---|
If a setting, such as the index update frequency, should be the same for most text columns, use the text configuration information to specify default settings. |
To reverse the changes made by ENABLE TEXT COLUMN, use the DISABLE TEXT COLUMN command. To disable all enabled text columns in a table, use the DISABLE TEXT TABLE command.
When you enable a text column, a handle column is added to the table, the document information (format, language, CCSID) is set, a log table is created, and an index is created,
During this step, Text Extender adds to the table a 60-byte VARCHAR handle column - a column that contains handles associated with the text column that is being enabled. Handles contain information about the text in the associated text column and in the associated external files. This information includes a unique document ID, the document's language, format, and CCSID, and the index name. They are described in The sample table DB2TX.SAMPLE.
Tip |
---|
This version of Text Extender cannot work with indexes created using an earlier release. You must disable and then re-enable all Text Extender-enabled databases and tables. |
The column containing text blocks is COMMENT. Before you can search through the text in this column, you must prepare the database and the COMMENT column for use by Text Extender.
After this preparation step, the DB2TX.SAMPLE table contains an additional column for handles.
Note: | When you subsequently search for text, you specify the handle column, not the text column, as the column to be searched. |
You specify the type of text documents you typically store in this text column: their format (such as ASCII), their language, and their CCSID. Defaults for this information can be specified in the text configuration settings. See Text configuration settings.
During this step, a log table and a view called LOGIXnnnnnn is created, where IXnnnnnn is the index name (available from the catalog view). If a default tablespace is specified in text configuration, the log table is stored there; otherwise, it is stored in the DB2 system default tablespace. To optimize performance and the use of disk space, you can specify a different tablespace to be used for the log tables.
Triggers are also created that add information to the log table whenever a document in the column is added or changed. This information causes these documents to be indexed the next time indexing takes place.
If external files are added or changed, these triggers are not aware of the changes. In such cases, to cause the triggers to add the information to the log table, use an UPDATE statement as shown in the example in Updating an index for external files.
If errors occur during indexing, such as when a document queued for indexing could not be found, so-called error events are added to the log table and can be displayed, as described in Displaying error events.
Tip |
---|
If you run out of log space in this step, see Enabling a text column in a large table for possible solutions. |
In partitioned databases, each table is assigned to a tablespace and a nodegroup. It is important that the log table is assigned to a tablespace that belongs to the same nodegroup as the enabled user table. Text Extender checks this during the ENABLE command.
If you intend to have a separate index for each text column, that is, you have skipped the step ENABLE TEXT TABLE, Text Extender creates a separate index for the text column during this step. You specify the type of index, how frequently the index is to be updated, and in which directory the index is to be stored. If, on the other hand, you intend to have one index for the whole table, then you have already run ENABLE TEXT TABLE and specified the index parameters; they are ignored if you repeat them here.
Use the UPDATEINDEX keyword to determine whether the indexing of the text documents in the specified text column begins immediately, or when periodic indexing is next scheduled. If you do not use this keyword, the value specified in the text configuration settings is taken.
Creating indexes of various types for a text column. You can create more than one index for a text column. This can be useful if you want to allow, for example, linguistic and fuzzy search on the same text column, by associating it with different index types, such as linguistic and Ngram indexes. You do this by running ENABLE TEXT COLUMN again, specifying not only the additional type of index to be created, but also a unique handle column name.
Tip |
---|
If you are using a partitioned database, and you decided to change the configuration of a nodegroup and start a redistribution of the table rows, you must recreate the text indexes and log tables, that is, you must disable and enable again all Text Extender-enabled tables of that nodegroup. |
If you are working with a table that has a large row length, keep in mind that enabling a text column adds a handle column of type DB2TEXTH (VARCHAR 60). Similarly, enabling an external file adds a handle column of type DB2TEXTFH (VARCHAR 210). This could be significant if the table is approaching its maximum row length as determined by DB2.
Also when you enable a text column in large table, use the DB2 UDB REORGANIZE utility to check whether the table needs to be reorganized. When you enable a large table for the first time, the following steps make indexing faster:
When you enable a text column or external files, Text Extender adds a handle column to the table and initializes the handle values, thereby causing DB2 UDB log entries to be written. If there is an unusually large number of log entries to be written, DB2 UDB can run out of log space.
There are two ways to handle this situation; the first is better for performance reasons:
LOGSECOND 50
Ensure that the sum of LOGPRIMARY and LOGSECOND is not greater than 128. You should also increase the application heap size.
APPLHEAPSZ 512
The following example enables text column COMMENT in table DB2TX.SAMPLE, and assigns the name COMMENTHANDLE to the handle column that is created:
db2tx "ENABLE TEXT COLUMN db2tx.sample comment HANDLE commenthandle"
Default values for the text information and for the index characteristics are taken from the text configuration settings.
The next example explicitly sets the values for the type of documents that are in the COMMENT column. Default values for the index characteristics are taken from the text configuration settings.
db2tx "ENABLE TEXT COLUMN db2tx.sample comment HANDLE commenthandle CCSID 819 LANGUAGE uk_english FORMAT rft"
The next example explicitly sets the values for the characteristics of the index that is created for the COMMENT column. The example sets the index type and the index directory, and sets the index update frequency so that the index is updated at 12:00 or 15:00, on Monday to Friday, if there is a minimum of 100 text documents queued. Default values for the text information are taken from the text configuration settings.
For a UNIX server:
db2tx "ENABLE TEXT COLUMN db2tx.sample comment HANDLE commenthandle INDEXTYPE linguistic UPDATEFREQ min(100) d(1,2,3,4,5) h(12,15) m(00) UPDATEINDEX UPDATE DIRECTORY DB2TX_INSTOWNER/db2tx/indexes"
For an OS/2 or Windows NT server:
db2tx "ENABLE TEXT COLUMN db2tx.sample comment HANDLE commenthandle INDEXTYPE linguistic UPDATEFREQ min(100) d(1,2,3,4,5) h(12,15) m(00) UPDATEINDEX UPDATE DIRECTORY \dmb\db2tx\indexes"
Text columns must be CHAR, GRAPHIC, VARGRAPHIC, LONG VARGRAPHIC, DBCLOB, VARCHAR, LONG VARCHAR, or CLOB to be enabled by Text Extender. If the documents are in a column of a different type, such as a user-defined distinct type (UDT), you must provide a user-defined function that takes the user type as input and provides as output type CHAR, GRAPHIC, VARGRAPHIC, LONG VARGRAPHIC, DBCLOB, VARCHAR, LONG VARCHAR, or CLOB.
Use the FUNCTION keyword in ENABLE TEXT COLUMN to specify the name of this function.
Example: You intend to store compressed text in a table.
db2 CREATE DISTINCT TYPE COMPRESSED_TEXT AS CLOB(1M)
db2 CREATE TABLE MYTEXT (author VARCHAR(50), text COMPRESSED_TEXT) db2 INSERT ...
To enable the text column for use by Text Extender:
db2tx "ENABLE TEXT COLUMN MYTABLE text FUNCTION uncompress HANDLE handle ..."
If the text column that you are enabling belongs to a table that is part of a multiple-node nodegroup, the index directory that you specify must be available on all physical nodes. If you use the default directory specified in the text configuration, make sure that the path is available on all nodes of the nodegroup. If this is not convenient, you can specify a specific path for each node in the ENABLE TEXT COLUMN command.
If you change the node configuration of a nodegroup that contains a table that is enabled for Text Extender, you must reindex the table. You can check if the node configuration has been changed by using the command TXNCHECK.
Tip |
---|
If you are using DB2 UDB Extended Enterprise Edition, and have tables partitioned among several nodes, there may be a difference between the system time of the nodes (when one of the nodes is on summer time, for example). To prevent problems, ensure that the difference between the system time settings of the nodes is within the value of DB2's MAX_TIME_DIFF setting. In Windows NT, use the NET TIME command to synchronize the system time settings. |
A structured document is one that contains sections, such as the document title, author, and subject. You can limit the scope of a search to a particular section of documents.
Structured flat file, XML, and HTML documents are supported. A flat file with marked up sections could look like this:
<title>IBM Dictionary of Computing <author>McDaniel, George <subject>Computers, Reference, ...
An XML document might look like this:
<?xml version="1.0"?> <!DOCTYPE LETTER SYSTEM "letter.dtd"> <LETTER> <HEADER> This tag has been skipped in the definition, so this text will be added to the section named LETTER. </HEADER> <ADDRESS> Text will added to section named LETTER/address. <CITY> Text will added to section named LETTER/address/City. </CITY> </ADDRESS> <CONTENT> Text will be added section named LETTER/Content. <NOSECTION> Text will be added to section named LETTER/Content since element "NOSECTION" was not defined. </NOSECTION> <GREETINGS> Text will be added section named LETTER/Content/Greetings. </GREETINGS> </CONTENT> </LETTER>
Restrictions:
To make Text Extender aware of such sections when indexing, you must create a document model containing descriptive section names of your choice for use in queries against that section, and the markup tags that identify the sections. Often, you will specify a section name that is the same as the tag name:
Section Tag Title title Author author Subject subject
The document model file. You describe the document model in a document model ini file. Tags that are not defined in the document model file are indexed according to the index type. There is one document model file for each Text Extender server instance. As you can see from the following example, a document model file can contain more than one document model.
;Comments must be preceded by a semicolon [MODELS] modelname=sample modelname=play [sample] title=title author=author subject=subject content=content [play] play=play author=author title=title scene=scene
For XML documents, the model name defined in the document models file must be the same as the section name identifier and the section name tag of the root element. The names are case-sensitive.
For section-enabled indexes, well-formed XML documents containing root elements with names matching the name of a valid document model will be indexed according to that document definition. If no matching document model is found, there will be an error message and the document will not be indexed. For indexes that are not section-enabled, all at least well-formed XML documents will be indexed and the section name tags will be ignored.
Here is an example of a model definition for documents in XML format. It contains nested sections, that is, sections within other sections.
; list of document models [MODELS] modelname = LETTER ; sample for XML document model definition ; left side = section name identifier encoding whole path ; right side = section name tags specifying tag for each element ; of the path through the tree down to specified ; node. Tag delimiter is '/'. [LETTER] LETTER = LETTER LETTER/address = LETTER/ADDRESS LETTER/Content = LETTER/CONTENT LETTER/address/City = LETTER/ADDRESS/CITY LETTER/Content/Greetings = LETTER/CONTENT/GREETINGS
The Letter model entry shows how to model sections to be stored in the index for documents in XML format. Information on the model tree is given on the right-hand side of section entries: each character string denotes an XML element, the "/" delimiter denotes the element hierarchy. The definition given in the configuration file must be a subset of the model as defined in the DTD (document type definition) file for a given document.
The first section entry for a model to be used for XML documents must be for the root element. The complete hierarchy of the model must be defined up to the last element defined as a section. Ordering must be "breadth first". That is, all elements of level n of the model hierarchy which will be used for indexing must be declared before any element of level n+1. You cannot skip the entry "LETTER/Content = LETTER/CONTENT" if you want to use "LETTER/Content/Greetings = LETTER/CONTENT/GREETINGS" as a section in the index. However, you can skip whole branches of the hierarchy: text contained in these will be treated as belonging to the root element.
Restrictions: Searching in nested sections is possible only in XML documents. For Ngram indexes, only one section name can be searched and it must not be a nested section.
When a server instance is created, an example of a document model file, desmodel.ini is created in the server instance subdirectory. There is also an example of a document model file for HTML documents in the same directory. HTML documents cannot contain nested sections.
To enable section support, define document models in desmodel.ini, and enable the text column that contains the documents using INDEXPROPERTY SECTIONS_ENABLED. See ENABLE TEXT COLUMN for details.
Tips |
---|
|
Summary |
---|
|
Text Extender can search not only in text stored in DB2 UDB tables, but also in text documents stored in files. This preparation step is needed if you intend to search for text in external files. The table associated with the external text files must not have been enabled by the command ENABLE TEXT TABLE.
An index is created, a log table is created, and the document information is set, in the same way as described in Enabling a text column.
Tips |
---|
|
A handle column of type DB2TEXTFH is added to an existing DB2 UDB table. The handle column will hold the references for the external files, each handle containing index and the document information (CCSID, format, and language).
See Handles for external files for a description.
You can specify additional parameters, such as the default index characteristics, in the same way as for enabling a text column.
After the index has been created, you can move or delete the external files. You can still search on the files. You can insert new rows in the table and use UPDATE INDEX to update the index with the new file references.
If the table you are enabling is using a nodegroup with multiple physical nodes, make sure that the external files you are referencing in the columns of your table are located on the node where the table partition resides.
db2tx "ENABLE TEXT FILES db2tx.extfile HANDLE filehandles INDEXTYPE linguistic UPDATEFREQ min(100) d(1,2,3,4,5) h(12,15) m(00) UPDATEINDEX NOUPDATE DIRECTORY \any\db2tx\indices"
db2 INSERT INTO db2tx.EXTFILE (FILEHANDLES) VALUES (db2tx.INIT_TEXT_HANDLE (850, 'TDS' 'US_ENGLISH', 'd:\dmb\samples\tx')
db2 UPDATE db2tx.EXTFILE SET FILEHANDLES = db2tx.file(FILEHANDLES,'d:\dmb\samples\tx') WHERE DOCID = 'doc1'
Tip |
---|
Do not use INIT_TEXT_HANDLE for updating handle columns that refer to external files. |
db2tx "UPDATE INDEX db2tx.extfile HANDLE filehandles"
You have now completed the steps to prepare your text documents to be searched.
If you specified NOUPDATE for the UPDATEINDEX keyword when you enabled the text column, Text Extender does not index the text immediately, but waits for the next periodic indexing. To update the index now, see Updating an index.
When indexing of the documents has finished, you can begin retrieving information as described in Chapter 5, Searching with Text Extender UDFs.
Enter QUIT to end the Text Extender command processor.
Tip |
---|
Use GET INDEX STATUS to determine when indexing has finished. |