The following table describes the steps in the pipeline phase.
Step | Description | Related information |
---|---|---|
1. Input queue | When received by an index server, the text document is placed in an input queue for the index server. | |
2. Document preprocessing | XML filtering: The XML filtering utility optionally removes surplus XML elements from XML content. | For information about defining surplus XML elements, see Setting XML elements as non-searchable. |
Language identification: The index server identifies the text language for the text document. | For information about selecting text languages, see Selecting text languages for an object store. | |
Tokenization: The server creates tokens for the text document based upon a language-aware analysis of the text. Word stems and other language constructs are identified. | For information about word stems, see Token searches: Language-aware versus exact-match. | |
3. Output queue | After preprocessing, the text document is sent to the output queue for the index server. | |
4. Token indexing | The index entry for the object in the target index is updated with the tokens. |