Any zone can also be extracted as a collection field. The differences between zones and fields are described under "Zones vs. Fields," earlier in this chapter.
- $control: 1
- zonespec:
- {
- # Extract all header lines of this email message as zones
- header: *
- # also extract these three header lines as fields as well as
- # zones
- header: From
- /field = yes
- header: To
- /field = yes
- header: Subject
- /field = yes
- }
- $$
style.zon
file must also be listed in the collection's style.ufl
or style.sfl
file. Otherwise, when the zone filter extracts these fields, the indexer will have no place to store the values and the values will be ignored.The field definitions for the Verity standard fields are included in the default
style.sfl
file. In this file, there are field definitions for several fields including "Subject", which is aliased to the field named "Title". All of the built-in zone filter modes automatically populate "Title" by default. The e-mail and news zone modes populate the "To" and "From" fields.While the standard field definitions cause the zone filter to define zones as collection fields, sometimes you need to create custom field definitions. If you are using the HTML zone mode, and you want to define the "To" and "From" zones as custom fields, then you need to provide a field definition corresponding to a zone name in the
style.ufl
file. Here is the data table of the style.ufl file to use with the previous
style.zon
file:
- data-table: ddf
- {
- # User fields go here. These fields also listed in
- # the style.zon file
- varwidth: From dd1
- varwidth: To dd4
- }
style.ufl
file, refer to Chapter 4, "Field Definitions."
Extracting HTML Zones as Fields
The "zone" filter supports a method for extracting zones as fields that differs from the method used by the "flt_meta" filter to extract meta tags as fields (as described in the next section, "Extracting META Tags as Fields").
The "zone" filter watches HTML in the document stream and produces a field tokens based on the zone name(s) specified in the
style.zon
file, where a zone name corresponds to a tag name. In the collection's internal documents table, the field is defined as the tag name and the tag value is the field value.
style.uni
file.The default
style.uni
file automatically invokes the "flt_meta" content filter with the "zone" filter. Here is an example of a short style.uni
file that can filter HTML documents with meta tags (the type:
statement below also appears in the default style.uni
file):
- $control: 1
- types:
- {
- autorec: "flt_rec"
- autorec: "flt_kv -recognize"
- type: text/html
- /charset = guess
- /def-charset = 1252
- /content-filter = "zone -html -nocharmap"
- /content-filter = "flt_meta"
- # if we get anything else, just skip it
- default:
- /action = skip
- }
- $$
name
attribute, and the field value is the value of the content
attribute in the meta tag.A sample <META> tag in HTML is shown below:
style.sfl
file, the field name "Abstract" is populated with the value "This is a long document". A field definition that corresponds to the meta tag's name attribute must appear in the style.ufl
or style.sfl
in order for the field to be populated by the filter. In the example above, the field named "Abstract" is aliased to the "Snippet" field in the default style.sfl
file so you would not need to add a field definition.