Documentation
(C) IBM Corp. 1996, 1999

Text Extender: Administration and Programming


Glossary

This glossary defines many of the terms and abbreviations used in this manual. If you do not find the term you are looking for, refer to the index or to the Dictionary of Computing, New York: McGraw-Hill, 1994.

A

access function
A user-provided function that converts the data type of text stored in a column to a type that can be processed by Text Extender.

administration
The task of preparing text documents for searching, maintaining indexes, and getting status information.

API
Application programming interface.

application programming interface (API)
A general-purpose interface between application programs and the Text Extender information retrieval services.

B

Boolean search
A search in which one or more search terms are combined using Boolean operators.

bound search
A search in Korean documents that respects word boundaries.

browse
To view text displayed on a computer monitor.

browser
A Text Extender function that enables you to display text on a computer monitor.

C

catalog view
A view of a system table created by Text Extender for administration purposes. A catalog view contains information about the tables and columns that have been enabled for use by Text Extender.

CCSID
Coded Character Set Identifier.

code page
An assignment of graphic characters and control function meanings to all code points. For example, assignment of characters and meanings to 256 code points for an 8-bit code.

command line processor
A program called DB2TX that:

Allows you to enter Text Extender commands

Processes the commands

Displays the result.

common-index table
A DB2 table whose text columns share a common text index. See also multi-index table.

count
A keyword used to specify the number of levels (the depth) of terms in the thesaurus that are to be used to expand the search term for the given relation.

D

data stream
Information returned by an API function, comprising text (at least one paragraph) containing the term searched for, and information for highlighting the found term in that text.

DB2 Extender
One of a group of programs that let you store and retrieve data types beyond the traditional numeric and character data, such as image, audio, and video data, and complex documents.

DBCS
Double-byte character support.

dictionary
A collection of language-related linguistic information that Text Extender uses during text analysis, indexing, retrieval, and highlighting of documents in a particular language.

disable
To restore a database , a text table, or a text column, to its condition before it was enabled for Text Extender by removing the items created during the enabling process.

distinct type
See user-defined distinct type.

document
See text document.

document handle
See handle.

document model
The definition of the structure of a document in terms of the sections that it contains. A document model makes Text Extender aware of the sections within documents when indexing. A document model lists the markup tags that identify the sections. For each tag you can specify a descriptive section name for use in queries against that section. You can specify one or more document models in a document model file.

dual index
A text index having the characteristics of a precise index and a linguistic index. See also Ngram index.

E

enable
To prepare a database , a text table, or a text column, for use by Text Extender.

environment variable
A variable used to provide defaults for values for the Text Extender environment.

environment profile
A script provided with Text Extender containing settings for environment variables.

escape character
A character indicating that the subsequent character is not to be interpreted as a masking character.

expand
The action of adding to a search term additional terms derived from a thesaurus.

extended matching
A process involving the use of a dictionary to highlight terms that are not obvious matches of the search term.

extender
See DB2 Extender.

external file
A text document in the form of a file stored in the operating system's file system, rather than in the form of a cell in a table under the control of DB2.

F

feature search
A search for terms such as names of people, places, or organizations, made in a linguistic index created using the FEATURE_EXTRACTION indexing option.

file handle
See handle.

format
The type of a document, such as ASCII, or WordPerfect.

free-text search
A search in which the search term is expressed as free-form text - a phrase or a sentence describing in natural language the subject to be searched for.

function
See access function.

fuzzy search
A search that can find words whose spelling is similar to that of the search term.

H

handle
A binary value that identifies a text document. It includes:

A document ID

The name and location of the associated index

The document's text information

If the document is located in an external file not under the control of DB2, the path and name of the file.

A handle is created for each text document in a text column when that column is enabled for use by Text Extender.

highlighting information
See data stream.

hybrid search
A combined Boolean search and free-text search.

I

index
To extract significant terms from text, and store them in a text index.

index characteristics
Properties of a text index determining:

The directory where the index is stored

The index type

The frequency with which the index is updated

When the first index update is to occur.

index type
A characteristic of a text index determining whether it contains exact or linguistic forms of document terms, or both. See precise index, linguistic index, dual index, and Ngram index.

initialized handle
A handle, prepared in advance, containing only the text format, or the text language, or both.

instance
A logical Text Extender environment. You can have several instances of Text Extender on the same workstation, but only one instance for each DB2 instance. You can use these instances to:

Separate the development environment from the production environment

Restrict sensitive information to a particular group of people.

instance variable
A variable used to provide a default value for the name of the instance owner, or the name of the instance owner's home directory.

L

language
The name of a dictionary to be used when indexing, searching and browsing.

linguistic index
A text index containing terms that have been reduced to their base form by linguistic processing. "Mice", for example, would be indexed as "mouse". See also precise index, Ngram index, and dual index.

logical node
A node assigned with other nodes to the same physical machine. See also physical node.

log table
A table created by Text Extender containing information about which text documents are to be indexed. Triggers are used to store this information in a log table whenever a document in an enabled text column is added, changed, or deleted.

M

masking character
A character used to represent optional characters at the front, middle, and end of a search term. Masking characters are normally used for finding variations of a term in a precise index.

match
The occurrence of a search term in a text document.

multi-index table
A DB2 table whose text columns have individual text indexes. See also common-index table.

N

Ngram index
A text index that supports DBCS documents and fuzzy search of SBCS documents. See also linguistic index precise index and dual index.

node
A server in a partitioned database environment. See also logical node, physical node, and nodegroup.

nodegroup
A named subset of one or more database partition servers. node assigned to a physically separate machine. See also logical node.

O

occurrence
Synonym for match.

P

partitioned database
A database consisting of several parts, each of which is maintained by a separate database partition server.

periodic indexing
Indexing at predetermined time intervals, expressed in terms of the day, hour, and minute, and the minimum number of documents names that must be listed in the log table for indexing, before indexing can take place.

physical node
A node assigned to a physically separate machine. See also logical node.

precise index
A text index containing terms exactly as they occur in the text document from which they were extracted. See also linguistic index Ngram index and dual index.

profile
See environment profile.

R

rank
An absolute value of type DOUBLE between 0 and 1 that indicates how well a document meets the search criteria relative to the other found documents. The value indicates the number of matches found in the document in relation to the document's size.

refine
To add the search criteria from a previous search to other search criteria to reduce the number of matches.

retrieve
To find a text document using a search argument in one of Text Extender's search functions.

S

SBCS
Single-byte character support.

search argument
The conditions specified when making a search, consisting of one or several search terms, and search parameters.

shell profile
See environment profile.

stop word
A common word, such as "before", in a text document that is to be excluded from the text index, and ignored if included in a search argument.

T

text column
A column containing text documents.

text configuration
Default settings for index, text, and processing values.

text document
Text of type CHAR, GRAPHIC, VARGRAPHIC, LONG VARGRAPHIC, DBCLOB, VARCHAR, LONG VARCHAR, or CLOB, stored in a DB2 table.

text index
A collection of significant terms extracted from text documents. Each term is associated with the document from which it was extracted. A significant improvement in search time is achieved by searching in the index rather than in the documents themselves. See also precise index, linguistic index, and dual index.

text information
Properties of a text document describing:

The CCSID

The format

The language.

text table
A DB2 table containing text columns.

tracing
The action of storing information in a file that can later be used in finding the cause of an error.

trigger
A mechanism that automatically adds information about documents that need to be indexed to a log table whenever a document is added, changed, or deleted from a text column.

U

UDF
User-defined function.

UDT
User-defined distinct type.

update frequency
The frequency with which a text index is updated, expressed in terms of the day, hour, and minute, and the minimum number of document names that must be listed in the log table for indexing, before indexing can take place.

user-defined distinct type (UDT)
A data type created by a user of DB2, in contrast to a data type provided by DB2 such as LONG VARCHAR.

user-defined function (UDF)
An SQL function created by a user of DB2, in contrast to an SQL function provided by DB2. Text Extender provides search functions, such as CONTAINS, in the form of UDFs.

W

wildcard character
See masking character.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]