Each text document that you intend to search has three characteristics that are significant to Text Extender:
Format
Language
Coded Character Set Identifier (CCSID).
Text Extender needs to know the format (or type) of text documents, such as WordPerfect or ASCII, that you intend to search. This information is needed when indexing text documents.
The text document types supported are:
For nonsupported document types, specify a numeric ID. Valid values are 1 to 100. This value is passed as the source format to the user exit that converts the original format to TDS.
If, during indexing, there is a document that is not one of the supported types, Text Extender provides an exit that writes the document to a disk and calls a program that you provide to extract the text into one of the supported formats.
To enable the user exit, edit the following ASCII files:
Windows NT: %DMBMMPATH%\instance\%DB2INSTANCE%\db2tx\descl.ini %DMBMMPATH%\instance\%DB2INSTANCE%\db2tx\txinsnnn\dessrv.ini UNIX: $DB2TX_INSTOWNERHOMEDIR/db2tx/descl.ini $DB2TX_INSTOWNERHOMEDIR/db2tx/txinsnnn/dessrv.ini
by adding the following statements:
[DOCUMENTFORMAT] USEREXIT=name_of_executable
where <name_of_executable> is the name of the user exit. You can specify a fully qualified file name, or, if the user exit is stored in a directory that is in the PATH statement, you can specify only the file name.
The parameters of the user exit must be as follows:
<name_of_user_exit> -sourcefile <sourcefilename> -targetfile <targetfilename> -sourceccsid <sourceccsid> -targetccsid <targetccsid> -sourceformat <sourceformat> -targetformat <targetformat>
The user exit must read the document from the <sourcefilename> and write the converted document to the <targetfilename>. The file names must be fully qualified. The target file must match the <targetccsid> and <targetformat>. The target format must be TDS. The target CCSID must be 850.
During enabling, a format other than TDS (flat ASCII) must be specified as format to force the user exit to be called.
Text Extender also needs to know in which language a document is written so that the correct dictionary can be used for the linguistic processing that occurs. Here is a list of the language parameters that you can specify when you enable a text column or external documents:
Each DB2 database uses a particular code page for storing character data. Text Extender, as an application working with DB2, runs using the same code page as the database.
Documents can be indexed if they are in one of the following CCSIDs. During search the CCSID of the database is used to interpret the CCSID of the search string.
Data stored in DB2 UDB character datatypes, such as VARCHAR or CLOB, are converted by DB2 UDB into the CCSID of the database. So, when enabling a text column for search, use the CCSID of the database as the CCSID parameter. When you enable a text column for search, you can avoid data conversion by DB2 by using a BLOB or binary datatype, and using the actual CCSID of the documents.
Note: |
CCSIDs 861, 865, and 4946 are not supported by DB2 UDB . To index documents having these CCSIDs, store the documents in a column with a binary data type (BLOB or FOR BIT DATA). |