The zone filter recognizes documents in internet message format that conform to the RFC822 standard. This includes most standard e-mail and Usenet news messages.
How the Zone Filter Parses Internet Message Format Documents
The zone filter parses the headers of Internet-style e-mail and Usenet news messages to create zones.
- From johns@verity.com Thu Dec 15 11:38:18 1994
- From: John Smith <johns@verity.com>
- Received: (from johns@localhost) by grimaldi
- (8.6.6.Beta9/8.6.6.Beta9) id LAA12705 for johns; Thu, 15 Dec
- 1994 11:36:35 -0800
- Message-Id: <199412151936.LAA12705@grimaldi>
- Subject: test message
- To: johns (John Smith)
- Date: Thu, 15 Dec 1994 11:36:34 -0800 (PST)
- This is a test message.
- John
- From johns@verity.com Thu Dec 15 11:38:18 1994
- From: [from-beg] John Smith <johns@verity.com>
- [from-end] Received: (from johns@localhost) by grimaldi
- (8.6.6.Beta9/8.6.6.Beta9) id LAA12705 for johns; Thu, 15 Dec
- 1994 11:36:35 -0800
- Message-Id: <199412151936.LAA12705@grimaldi>
- Subject: [subject-beg] test message
- [subject-end] To: [to-beg] johns (John Smith)
- [to-end]Date: [date-beg] Thu, 15 Dec 1994 11:36:34 -0800 (PST)
- [date-end]
- This is a test message.
- John
- Header-line-name: data data data \n
- [<whitespace>more data, more data more data \n] ...
to
matches To
.)Optionally, the header line can be continued on the next line with a continuation line. Lines whose first character is a whitespace character are continuation lines. The text of the entire continuation line is included as part of the previous header line. For example, the
To
header line in the following e-mail spans multiple lines. Again, zone starts and ends are underlined.
- From:[from-beg] John Smith <johns@verity.com>
- [from-end]Subject:[subject] another test message
- [subject-end]To:[to-beg] johns (John Smith),
- toddq@verity.com (Todd Quidnunc),
- mick@verity.com (Mickey O'Donnicker),
- ralphp@verity.com (Ralph Poobah)
- [to-end]
style.zon
file and they will be extracted as zones.
style.uni
file is appropriate for e-mail documents:
- type: message/rfc822
- /charset = guess
- /def-charset = 1252
- /content-filter = "zone -email -nocharmap"
header
keyword in the style.zon
file.The
he
ader keyword specifies extraction or exclusion of header lines. The syntax is as follows:
headername
headername
specifies the name of the header line you want to extract as a zone. Header names are case insensitive. To extract all header names as zones, use *
for headername
. You can use the following optional modifiers with the header
keyword.
/ignore
modifier.
- $control: 1
- zonespec:
- {
- header: *
- header: received
- /ignore = yes
- header: message-id
- /ignore = yes
- }
- $$
The following is an example of the second approach:
- $control: 1
- zonespec:
- {
- header: received
- header: message-id
- }
- $$
style.uni
file is appropriate for Usenet news documents:
- type: message/news
- /charset = guess
- /def-charset = 1252
- /content-filter = "zone -news -nocharmap"
style.zon
file and the header
keyword to define zones, as described in "Custom Zone Definitions" later in this chapter.