Zone Filter Overview


This section provides an introduction to zones, describes the kinds of documents that use zones, and explains the differences between fields and zones.

Introduction to Zones

Zones are specific regions of a document to which searches can be limited. The Verity engine uses the zone filter to build zone information into a collection's full-word index. The enhanced index permits quick and efficient searches using zones. A zone may be automatically defined by the zone filter, or you may define it in the style.zon file.

Zone searching is useful when you believe that limiting your search to a particular zone will produce more accurate search results. Speed of searching is not a factor, since searching a zone for information use the same amount of time as searching the entire document. Note however, that searching a zone is faster than field searching, since zone searching uses the fast search algorithm of the search engine, whereas field searching is a linear process.

Document Types

You can use the zone filter and search over zones for two specific types of documents:

Zones vs. Fields

Fields are extracted from the document and stored in the collection for retrieval and searching, and can be returned on a results list. Zones, on the other hand, are merely the definitions of "regions" of a document for searching purposes, and are not physically extracted from the document in the same way that fields are extracted. The contents of a zone cannot be returned in the results list of an application.

A region of text must first be defined as a zone in order to be a field. Therefore, it can be only a zone, or it can be both a field and a zone. Whether you define a region of text as a zone only or as both a field and a zone depends on your particular requirements.

Advantages of Using a Field

For example, you can do a query like the following on a field: date > may 1, 1993. Because zones are not parsed for content, such searches cannot be performed on them.

This is most useful for those parts of a document that help identify it, like the title and the author. Zones cannot be returned on a results list.

Advantages of Using a Zone

Arbitrary query searches can be restricted to a zone, but not to a field (which can be searched only with CONTAINS and relative comparisons).

This is because the source text is not stored in the binary collection files. Only a description of where each zone starts and ends is stored, so the zone's size does not matter. Field values are stored in a table in the collection, and therefore tend rapidly to increase the size of the collection.

Processing Order

Field parsing and populating in the documents table are performed before full-text indexing. If zones are defined, they are interpreted during full-text indexing.

Zones and Zone Occurrences

When you extract a zone from a document you may be extracting a single zone or you may be extracting multiple occurrences of a zone. For example, a Usenet news or internet e-mail document will only have one Subject: field. However, an HTML document may have several <h2> tags. When you create a zone, all of the <h2> tags are extracted, and all are searched when you submit a query on the zone. The specifics of zone searches are discussed in "Searching in Zones," later in this chapter.





Copyright © 2002, Verity, Inc. All rights reserved.