Table of Contents

Preface

About This Manual
Scope and Usage
Audience
Related Documentation
Using This Manual
Organization of This Manual
Stylistic Conventions
Verity Technical Support

Part I: Overview

Chapter 1 Overview

Collection Features
What Is a Collection?
Features of the Collection Architecture
Contents of Collection Indexes
Dynamic Access to Documents
Universal Document Support
Optional Indexes
Collection Configuration
The Collection Directory
Collection Style Files
Collection Optimizations
Altering Indexing Behavior
Inside a Verity Collection
Types of Indexes
Documents Table
Persistent Fields
Transitory Fields
What Are Partitions?
Index Tuning
Defining and Populating Fields
Document Filters and Formatting
Universal Filter and the Helper Filters
Using Your Own Filter
Indexing Modes
Topic Sets
Troubleshooting and Maintenance Tools

Chapter 2 Using Gateways

Gateway Configuration Overview
Primary Document Key (VdkVgwKey) - New Format
Simple Keys
Gateway Field Types
Security Method
Using Separate Gateways for Indexing and Viewing
Gateway-Related Style Files
Gateway-Related Style Files
Using Different Gateways for Indexing and Viewing
Using the HTTP Gateway
Features
Security
Style Directory
Configuration File-Syntax
Configuration File-Sample
Using the File System Gateway
Features
Style Directory
Configuration File-Syntax

Part II: Collection Style

Chapter 3 Setting Policies

About Indexing Modes
What Indexing Modes Do
Dynamically Changing Modes
Background vs. Administrative Optimizations
Using Indexing Modes
Using the Verity API
Using mkvdk
Built-in Indexing Modes
Generic Mode (generic)
News Feed Optimizer Mode (newsfeedopt)
Custom Indexing Modes
Metaparameter modifiers in the style.plc File
Defining a Custom Mode
Defining a Default Indexing Mode
Inheriting From a Predefined Indexing Mode
Defining Multiple Custom Indexing Modes
Returning Document Counts
Using a style.plc File

Chapter 4 Field Definitions

Data Types
Data Tables
Field Types
Constant Fields
Variable Fields
Field Definition Files
Internal Fields-In the style.ddd File
style.ddd contents
Standard Fields-In the style.sfl File
Field Aliases in style.sfl
style.sfl Contents
Custom User Fields-In the style.ufl File
style.ufl Contents
style.ufl File Syntax
Mandatory Statements
$control
descriptor
data-table
Constant Field Types
constant
autoval
worm
Variable Field Types
fixwidth
fixwidth Length and Ranges for Integer Data Types
varwidth
dispatch

Chapter 5 Index Tuning

Tuning Collection Index Contents
Style Files Affecting Collection Index Contents
Setting Up the Collection style Directory
Using the style.lex File
style.lex File Syntax Reference
General Information
define Statements
token Statements
Statement Interpretation
Character Mapping
Using the style.stp File
style.stp Syntax Reference
style.stp Features
Case Sensitivity
Regular Expressions
Using the style.go File
style.go Syntax Reference
Using the style.ufl File
Default style.ufl File
Indexed Field Type
Minmax Field Type
Using the style.prm File
Feature Vectors
Stored Summaries
Summary Types
Summary Type Parameters
Zone Argument
Case-Insensitive Word Indexes
Instance Vector Encodings (PSW and WCT)
PSW Encoding
WCT Encoding
Instance Vector Encoding and Searching
SENTENCE and PARAGRAPH Operators
Soundex Data
Highlight Location Data
Qualify Instance Data
Enable Indexing on Nouns and Noun Phrases
style.prm File Syntax
Default style.prm File
Using the style.fxs File

Chapter 6 Document Filters and Formatting

The Virtual Document
Document Layout Definition
Document Filter Specification
Default style.dft File
Universal Filter Functionality
Using the style.dft File
style.dft File Syntax
style.dft File Statements
style.dft File Keywords
Shorthand Notation for zone-begin and zone-end
style.dft Keyword Modifiers
Date Formats in the style.dft File
Late Binding for Field Elements
The Universal Filter
Invoking the Universal Filter
How the Universal Filter Works
Components
How Filtering Occurs
Character Set Recognition and Mapping
Checking File Types
Using the style.uni File
Syntax of style.uni File Statements
Syntax of style.uni File Keywords
style.uni Keyword Modifiers
Configuration
Changing the style.uni File
Universal Filter Document Types
Recognized Document Types
Recognized Categories of Document Types
The KeyView Filters
The PDF Filter
Custom Lexing Rules Not Supported
Specifying the PDF Filter
Using the -fieldoverride Option
Using the -charmapto Option
PDF Fields
Standard PDF Fields
Optional PDF Fields
Defining Optional PDF Fields
The XML Filter
Requirements for Indexing XML Documents
Requirements for Data Files
Implementation Summary
XML Filter
Style Files
Style File Configuration
style.uni File
style.xml File
style.ufl File
style.dft File
Indexing XML Documents

Chapter 7 Searching Documents by Fields

Methods for Populating Fields
Using the Bulk Modify Feature
Extracting Field Values
style.tde Syntax

Part III: Search Features

Chapter 8 The Zone Filter

Zone Filter Overview
Introduction to Zones
Document Types
Zones vs. Fields
Advantages of Using a Field
Advantages of Using a Zone
Processing Order
Zones and Zone Occurrences
Using the Zone Filter
Specifying the Zone Filter
Built-in Zone Mode Options
Character Mapping Options
Extracting META Tags as Fields
Extracting Zones as Fields
Zones for Markup Language Documents
How the Zone Filter Parses Markup Language Documents
Implicit Zone Endings
Zones for HTML Documents
Zone Filter Specification for HTML
Supported HTML Tags
Supported HTML Entities
Additional HTML Parsing Rules
Zones for SGML and XML Documents
Zone Filter Specification for SGML and XML
Using the style.zon file
Zones for Internet Message Format Documents
How the Zone Filter Parses Internet Message Format Documents
Zone Filter Specification for E-mail
Using the style.zon file
Zone Filter Specification for Usenet News
Using the style.zon file
Custom Zone Definitions
Built-in vs. Custom Zone Definitions
style.zon File
style.zon Default Behavior
style.zon File Syntax
The zonespec Keyword
The element Keyword
The attribute Keyword
The entity Keyword
Entity Substitution
Dumping style.zon Information
Debugging the style.zon File
Dumping Information for Built-In Modes
Modifying Built-in Behavior
Attribute Extraction
Defining Zones as Collection Fields
Extracting HTML Zones as Fields
Extracting META Tags as Fields
Defining Zones for Virtual Documents
Hidden Elements in Zones
Entries in the style.dft File
Searching over Hidden Zones
Special Noindex and Noextract Zones
Noindex Zones
Noextract Zones
Hidden Elements in NoExtract Zones
Searching in Zones
Using the Query Language IN Operator
Using a Custom Query Parser
Searching Multiple Zone Occurrences

Appendixes

Appendix A Style Files

Summary of Style Files
Standard Style Files
Editable Standard Style Files
Non-Editable Style Files
Gateway-Related Style Files
Sample StyleSets
Using Custom Style Files

Appendix B Universal Filter Document Types

Supported Document Types
HTML and Tagged ASCII Formats
WYSIWYG Formats
Word Processing
Spreadsheets
Presentation Graphics
Recognized Document Types
Recognized Categories of Document Types
XML Documents
Troubleshooting
Checking File Types
Disable Document Filters by MIME Type

Appendix C Supported Date Formats

Date Import Formats
Date Import Format Strings
Table Conventions
Zulu Date Format
Time Formats
Numeric Date Formats

Appendix D Supported Languages

Languages and Character Sets
Specifying Languages and Character Sets

Appendix E Regular Expressions

Operators for Regular Expressions
Symbols
Substrings

Appendix F Custom Thesaurus

The Thesaurus Control File
Sample Thesaurus Control File
The synonyms Keyword
The list Keyword
The qparser Keyword
Using the mksyd Command-line Tool
Creating a Control File from an Existing Thesaurus
Modifying a Control File for a Language Other than English
Building a Custom English Thesaurus
Building a Custom Non-English Thesaurus
Thesaurus Integration
Using a Knowledge Base Map to Point to a .syd File

Appendix G Collection Troubleshooting and Maintenance Tools

Using didump
Viewing the Word List
Viewing the Zone List
Viewing the Zone Attribute List
Using browse
Displaying Fields in a Documents Table File
Using rcvdk
Starting rcvdk
Attaching to a Collection
Attaching to Subsequent Collections
Basic Searching
Viewing Results
Authenticating with rcvdk
Displaying More Fields
Using merge
Merging Collections
Splitting Collections
Using mkvdk for Incremental Squeeze
Options for 8-bit Characters

Appendix H Collection Limits

Appendix I Migrating Indexes

Migrating a Thesaurus File
Migrating HTTP- and File System-based Collections
Notes
Instructions

Index





Copyright © 2002, Verity, Inc. All rights reserved.