If you do not use one of the built-in modes listed in the previous sections, you must specify your own zone definitions in the
style.zon
file. Built-in vs. Custom Zone Definitions
You can define zones either by specifying one of the built-in modes or by creating your own style.zon
file. However, you cannot use the style.zon
file to augment or override the behavior of a built-in mode. If you specify a built-in mode, the engine ignores the style.zon
file. If the built-in behavior does not meet your requirements, you will need to specify a custom set of zone definitions in the style.zon
file.
built-in definitions. For information on how to dump the definitions for a built-in mode and modify them for use in a style.zon
file, follow the steps described in "Dumping style.zon Information," later in this section.
style.zon File
The structure of the style.zon
file is as follows:
- $control: 1
- zonespec:
- {
- .
- .
- .
- }
- $$
style.zon
file must reside in the style
directory of the collection. Note that the file must begin with $control: 1
and zonespec:
on the first and second uncommented lines respectively. The file must end with $$
on a line by itself.The content of the file depends upon the type of document for which you are creating zones, and how you want the various zones stored in the collection. The syntax for the various
style.zon
keywords and sample style.zon
files for the various document types are included in "Zones for Markup Language Documents" and "Zones for Internet Message Format Documents" in this chapter.
style.zon Default Behavior
If you do not specify a mode
argument in the /filter=zone modifier, and no style.zon
file is found, the default behavior is equivalent to the following style.zon file:
- $control: 1
- zonespec:
- {
- element: *
- }
- $$
style.zon
file to specify the tags you want to create as zones by using the element
and attribute
keywords. The examples in the description of the style.zon
file refer to common entities in SGML.
zonespec
keyword in the style.zon
file, as follows.
Modifier
|
Description
|
---|---|
/ignoreattributes
|
Specify YES or NO . The default is YES . Ignores tag attributes unless overridden by a statement beneath it.
|
zonespec
keyword appears as follows when using this modifier.
- $control: 1
- zonespec:
- /ignoreattributes = yes
- {
- element: *
- }
- $$
element
keyword specifies extraction or exclusion of element tags. It uses the following syntax:
elementname
elementname
specifies the name of the element (that is, the tag) you want to extract as a zone. Element names are case-insensitive. To extract all tags as zones, use * for
elementname
. You can use the following optional modifiers with the element
keyword.
Modifier
|
Description
|
---|---|
/ignore
|
Specify YES to ignore the specified element. If you use the asterisk for elementname , only those elements specified with the /ignore=yes modifier are ignored. If you do not use the asterisk, all the elements specified are extracted and those omitted are ignored.
|
/field
|
Specify YES to extract the specified element as a field as well as a zone. See "Defining Zones for Virtual Documents" in this chapter. The extracted field value is stored in the elementname field. To extract attribute names, you must also extract the element name.
|
/ignore
modifier. The second is to explicitly list only those elements you want extracted. The following is an example of the first approach:
- $control: 1
- zonespec:
- {
- element: *
- element: heading3
- /ignore = yes
- element: list-item
- /ignore = yes
- }
- $$
The following is an example of the second approach.
- $control: 1
- zonespec:
- {
- element: header
- element: body
- element: title
- element: textzone
- element: section
- element: sub-section
- element: footnote
- element: appendix
- }
- $$
attribute
keyword specifies extraction or exclusion of attributes within a tag. It is entered in the style.zon
file as a child of element
and uses the following syntax:
attributename
attributename
specifies the name of the attribute you want to extract as a zone. Attribute names are case-insensitive. To extract all attribute names as zones, use *
for attributename
. You can use the following optional modifiers with the attribute
keyword.
Modifier
|
Description
|
---|---|
/ignore
|
Specify YES to ignore the specified attribute. To extract the attribute, specify NO (default). If you use the asterisk for attributename , only those attributes specified with the /ignore=yes modifier are ignored. If you do not use the asterisk, all the attributes specified are extracted and those omitted are ignored.
|
/field
|
Specify YES to extract the specified attribute as a field value as well as a zone. See "Defining Zones for Virtual Documents" in this chapter. When a /field=YES modifier is assigned to an attribute, the attribute name and value are prepended to the field value named by the element name. NOTE: Using /field=YES does not cause the attribute information to be extracted into its own field.
|
/default
|
Specify the default attribute value you want to use when the attribute name does not occur in the zone tag.
|
/values
|
Specify values that may appear in a tag without the corresponding attribute name.
|
- $control: 1
- zonespec:
- {
- element: header
- element: body
- element: title
- {
- attribute: company
- /default: "IBM"
- }
- element: textzone
- element: section
- element: sub-section
- element: footnote
- element: appendix
- }
- $$
attribute
keyword level:
/ignoreattributes=yes
modifier is specified, then all attributes are ignored.
/ignore=yes
modifier), then that element's attributes are ignored as well.
/ignore=no
for that attribute.
entity
keyword specifies the translation of entities to their equivalents. It uses the following syntax:
name
"
value
"
name
is the name
of the entity as it appears in the document, and value
is the way you want the entity to display. You can use the following optional modifiers with the entity
keyword.Entities in SGML are used to specify characters that would otherwise be considered as part of the markup language or that cannot be typed on the normal keyboard.
The entity begins with an ampersand (&) and ends with a semicolon (;) or white space. No space is permitted between the ampersand character and the following entity name. The entity name consists of alphanumeric characters plus any combination of underscores, dashes, and number signs (
#
). If the entity is terminated with a semicolon, the semicolon is also part of the string that is replaced by the equivalent string. If the entity is terminated by a whitespace character, that whitespace is not considered part of the string that is replaced.For example, assume the following entities and their translations:
The
style.zon
file would then appear as follows:
- $control: 1
- zonespec:
- {
- entity: amp "&"
- entity: lessthan "<"
- entity: greaterthan ">"
- }
- $$
- Here is some text. First an entity delimited by a semicolon:
- S&P's stock index. Second, entities delimited by a spaces:
- the &greaterthan character and the &lessthan character.
style.zon
file, the resulting document would then appear as follows
- Here is some text. First an entity delimited by a semicolon:
- S&P's stock index. Second, a entities delimited by spaces:
- the > character and the < character.
style.zon
file or in the built-in rules, then the text of that entity is passed through the filter unchanged.
style.zon
file to the standard output. This can be useful in various circumstances, including debugging the style.zon
file and modifying the behavior of built-in modes.
style.zon
file can be debugged using the -dump
flag in the filter specification in the style.uni
file. This operation is supported for debugging only, and other entries should not be modified. To obtain the style.zon file settings in effect, follow these steps:
-dump
flag present in the style.uni
file, the filter prints to the standard output a style.zon
file with the settings that are in effect at the time of filtering. (Printing to standard output is not supported in a Windows DLL.) After the style.zon
file is printed, the actual indexing does not take place.The
-dump
option produces output in the character set of the prevailing locale. The output can be mapped to another character set using the -charmap
option of mkvdk.
-dump
flag in the style.uni
file for the HTML mode is shown below:
- type: text/html
- /charset = guess
- /def-charset = 1252
- /content-filter = "zone -html -dump"
style.zon
file.
-mode
argument in the style.uni
file, assuming the universal filter is used. Otherwise, the specification is made in the style.dft
file, as described earlier in "Using the Zone Filter."
element
keyword. This means the element value and the one or more attribute values are stored together in the same collection field. Consider the following style.zon
file:
- $control: 1
- zonespec:
- /ignoreattributes = no
- {
- element: name
- /field = yes
- {
- attribute: first
- /field = yes
- attribute: last
- /field = yes
- }
- }
- $$
- This is <name first="emily" last="shaffer">AAA</name>here.
- Another is <name first="al" last="jones">ZZZ</name>here.
/field=yes
modifier on the element
statement indicates that the field is populated (zone contents are extracted and stored in the field).
/field=YES
modifier defined cause an attribute name and value string to be prepended to the field. The format of the attribute string is:
name
element in the document cause the field value to be overwritten. The last occurrence encountered is saved in the field. The sample document contains two occurrences of the name
element, so the values in the second instance were saved.