>>-+-CONTAINS------+--(--handle--,--search-argument--)--------->< +-NO_OF_MATCHES-+ +-RANK----------+ '-SEARCH_RESULT-'
This section describes how to use the UDFs provided with Text Extender to search in DB2 databases containing text. It tells you how to:
The use of SEARCH_RESULT is described in Improving search performance.
Each of these UDFs searches in the text index for occurrences of the search argument. If there are, say, 100 000 text documents in the table, the CONTAINS, RANK, or NO_OF_MATCHES UDF is called 100 000 times. But the text index is not searched 100 000 times. Instead, the first time the UDF is called, an internal list of all the documents containing the search term is created; subsequent calls of the UDF determine if the document concerned is in the list.
Tip |
---|
When you use the Text Extender UDFs to search in a table, be sure to pass the handle column to the UDF, rather than the text column. If you try to search in a text column, SQL responds with a message indicating that the data type is wrong, for example: No function by the name "CONTAINS" having compatible arguments was found in the function path. |
If you search for text immediately after issuing the ENABLE TEXT TABLE or ENABLE TEXT COLUMN command, an error RC_SE_EMPTY_INDEX can occur which indicates that the index being created by the command does not yet exist. The time taken for an index to be created depends on factors such as the number of documents being indexed, and the performance of the system doing the indexing. It can vary from several minutes to several hours, and should be done when the system is lightly loaded, such as over night.
If this message occurs, try searching again later, or use GET INDEX STATUS to check whether indexing errors have occurred.
This example demonstrates how the CONTAINS function searches for text in documents identified by a handle. It returns 1 if the text satisfies the search argument, otherwise it returns 0.
SELECT DATE, SUBJECT FROM DB2TX.SAMPLE WHERE DB2TX.CONTAINS (COMMENTHANDLE, '"compress"') = 1
In this example, you search for the term compress in the text referred to by the handles in the column COMMENTHANDLE. The handles in the COMMENTHANDLE column indicate where the COMMENT text is indexed.
Tip |
---|
If you have created mixed-case identifiers for tables or columns, remember that these names must be enclosed in double quotes. For example: SELECT DATE, SUBJECT FROM "DB2TX.Sample" WHERE DB2TX.CONTAINS (COMMENTHANDLE, '"compress"') = 1 If you specify DB2 UDB select statements from the command line, the operating system command-line parser removes special characters such as double quotes from the command string, so you must use a backslash to mask these special symbols. For example: DB2 "SELECT DB2TX.file(COMMENTHANDLE) FROM DB2TX.Sample" WHERE DB2TX.CONTAINS (COMMENTHANDLE, '\"compress\"') = 1 |
Use the NO_OF_MATCHES function to determine how often the search criteria are found in each text document.
WITH TEMPTABLE(DATE, SUBJECT, MATCHES) AS (SELECT DATE, SUBJECT, DB2TX.NO_OF_MATCHES(COMMENTHANDLE,'"compress"') FROM DB2TX.SAMPLE) SELECT * FROM TEMPTABLE WHERE MATCHES > 0
NO_OF_MATCHES returns an integer value.
RANK is an absolute value that indicates how well the document met the search criteria relative to other found documents. The value indicates the number of matches found in the document in relation to the document's size. You can get the rank of a found document by using the RANK UDF.
Here is an example:
WITH TEMPTABLE(DATE, SUBJECT, RANK) AS (SELECT DATE, SUBJECT, DB2TX.RANK(COMMENTHANDLE,'"compress"') FROM DB2TX.SAMPLE) SELECT * FROM TEMPTABLE WHERE RANK > 0 ORDER BY RANK DESC
RANK returns a DOUBLE value between 0 and 1.