-
PICS Profile Language Working Group - PicsRULZ
|
-
-
Editor:
-
Martin Presler-Marshall, IBM <mpresler@us.ibm.com>
-
Authors:
-
Christopher Evans, Microsoft <cevans@microsoft.com>
-
Alex Hopmann, Microsoft <alexhop@microsoft.com>
-
Martin Presler-Marshall, IBM <mpresler@us.ibm.com>
-
Paul Resnick, AT&T <presnick@research.att.com>
Status of this document
This document is a draft of the PicsRULZ filtering language. This language
has been officially renamed PICSRules.
Version 1.1 of PICSRules has been
made an W3C recommendation as of December 1997. This document is significant
in that there is at least one known implementation that used this
draft; that implementation is IBM Web Traffic Express version 1.0.
Note that this document has no official status. Version 1.0 of PicsRULZ was never
accepted as a standard or recommendation by any organization.
A list of current W3C Recommendations and other technical documents
can be found at http://www.w3.org/pub/WWW/TR/.
Abstract
This document defines a language for writing profiles, which are filtering
rules that allow or block access to URLs based on PICS labels that describe
those URLs.
Definitions
This specification uses the same words as RFC 1123 for defining the significance
of each particular requirement. These words are:
-
MUST
-
This word or the adjective "required" means that the item is an absolute
requirement of the specification.
-
SHOULD
-
This word or the adjective "recommended" means that there may exist valid
reasons in particular circumstances to ignore this item, but the full implications
should be understood and the case carefully weighed before choosing a different
course.
-
MAY
-
This word or the adjective "optional" means that this item is truly optional.
One vendor may choose to include the item because a particular marketplace
requires it or because it enhances the product, for example; another vendor
may omit the same item.
An implementation is not compliant if it fails to satisfy one or more of
the MUST requirements for the protocols it implements. An implementation
that satisfies all the MUST and all the SHOULD requirements for its protocols
is said to be "unconditionally compliant"; one that satisfies all the MUST
requirements but not all the SHOULD requirements for its protocols is said
to be "conditionally compliant."
The PicsRULZ language: examples
Example 1: Forbid access to certain URLs
1 (PicsRule-1.0
2 (
3 failURL ("http://www.grody.com" "http://www.gross.net")
4 Filter (Pass "Unless-Prohibited")
5 )
6 )
The numbers on the left are line numbers for ease of reference; they
aren't part of the actual rule.
This example forbids access to a specific set of URLs, without using
any PICS labels. Any URL that begins with either http://www.grody.com or
http://www.gross.net will be blocked; any other URLs are considered acceptable.
Example 2: Forbid access based on PICS labels
1 (PicsRule-1.0
2 (
3 serviceinfo (
4 "http://www.coolness.raleigh.ibm.com/ratings/V1.html"
5 shortname "Cool"
6 bureauURL "http://www.ics.raleigh.ibm.com/LabelBureau")
7 Filter (
8 Pass "Unless-Prohibited"
9 Block "((Cool.Coolness <= 3) or (Cool.Graphics >= 3))"
10 )
11 )
This rule checks the rating given to documents according to the "Cool"
rating service ("http://coolness.raleigh.ibm.com/ratings/V1.html"). Labels
will be fetched from the label bureau "http://www.ics.raleigh.ibm.com/LabelBureau".
Documents which are not sufficiently cool or have too many graphics will
be blocked. Everything else, including unlabeled documents, will be allowed.
Example 3: Allow access based on PICS labels: block everything else
1 (PicsRule-1.0
2 (
3 ServiceInfo (
4 name "http://www.coolness.raleigh.ibm.com/ratings/V1.html"
5 shortname "Cool"
6 bureauURL "http://www.ics.raleigh.ibm.com/labelbureau")
7 Filter (
8 Pass "((Cool.Coolness > 3) and (Cool.Graphics < 3))"
9 )
10 )
This rule also checks the rating given to documents according to the "Cool"
rating service. Here, only documents which are sufficiently cool and do
not have too many graphics will be allowed. Everything else, including
unlabeled documents, will be blocked.
Example 4: A more complex example
1 (PicsRule-1.0
2 (
3 failURL ("http://www.badnews.com/" "http://www.worsenews.com")
4 passURL ("http://www.rated-g.org/")
5 name (rulename "Example 4"
6 description "Example 4 from PicsRULZ spec; simply shows how PicsRULZ rules are formed. This rule is not actually intended for use by
real users.")
7 source (sourceURL "http://martinm.raleigh.ibm.com/Rules/Example1.html")
8 ServiceInfo (name "http://coolness.raleigh.ibm.com/ratings/V1.html"
9 shortname "Cool"
10 bureauURL "http://www.ics.raleigh.ibm.com/labelbureau")
11 serviceinfo ("http://hotness.raleigh.ibm.com/ratings/V1.html"
12 shortname "Hot"
13 bureauURL "http://www.ics.raleigh.ibm.com/labelbureau")
14 Filter (Pass "(Hot.Hotness > 4) "
15 Block "((Cool.Coolness < 3) or (Cool.graphics = 1))"
16 )
17 )
18 )
Explanation of example
-
Line
-
Explanation
-
1
-
Defines this construct as a PICS rule, and gives the version number.
-
3
-
Provides a list of URLs which will be automatically blocked without examining
any PICS labels. The quoted items are URL prefixes, just as in generic
PICS labels.
-
4
-
Provides a list of URLs which are automatically passed by the user-agent;
symmetric to failURL.
-
5
-
Provides a short, human-readable name for this rule. There is no requirement
for uniqueness on this name; it's meant as a mnemonic for users when manipulating
rules in some sort of a user interface.
-
6
-
Provides a longer, human-readable description of this rule. This is meant
to be use for an explanation of the semantics of this rule.
-
7
-
Specifies "where the rule came from". This URL is intended to point to
a human-readable Web page which will give more information about this rule,
who created it, why it was created, possible updates, etc.
-
8-10
-
Defines the rating service "http://coolness.raleigh.ibm.com/ratings/V1.html",
with short name "Cool" and identifies a label bureau from which to fetch
its labels.
-
11-13
-
Defines the rating service "http://hotness.raleigh.ibm.com/ratings/V1.html",
with short name "Hot", and defines a label bureau that serves labels for
that service.
-
14-15
-
The filtering rule that examines labels. If lines 3 and 4 have not determined
whether the URL is OK, these lines are evaluated. A URL must pass the permissions
criteria and not fail the prohibitions criteria.
-
14
-
The permissions portion of the filtering rule. Only URLs that are labeled
as sufficiently "hot" will be permitted. URLs unlabeled on the "hotness"
scale will be rejected.
-
15
-
The prohibitions portion of the filtering rule. If the URL has passed the
permissions criteria, the prohibitions are considered. If a URL is labeled
as not cool enough or as having graphics = 1, it will be rejected. If it
is not labeled using the "cool" rating service, the item is accepted.
Full syntax
-
Let us first consider the basic underpinnings of a PicsRULZ rule, then
the general format of the rule, and finally the format of the filter clause.
Basic structure
-
PicsRULZ rules are based on a limited form of an S-expression, namely a
parenthesized attribute-value pair. The parenthesized attribute-value pairs
are allowed to be nested. They also contain a concept known as a "primary
value". The primary value for any attribute-value pair is a value whose
name is defined by the context; it can be thought of as the fundamental
value associated with a given clause. An attribute-value pair MUST contain
the primary key-value pair; it MAY contain additional key-value pairs.
The general grammar for these limited S-expressions is:
attrvalpair:: attribute whitespace value | primaryvalue
attribute:: alphanumstr
value:: quotedstring | '(' attrvalpair (whitespace attrvalpair)* ')'
primaryvalue:: quotedstring+ | '(' attrvalpair+ ')'
quotedstring:: ('"' notquotechars '"') | ("'" notquotechars "'")
alphanumchar:: alphanum+
whitespace:: ' ' | '\t' | '\r' | '\n'
alphanum:: '0' - '9' | 'A' - 'Z' | 'a' - 'z'
-
notquotechars :: any ASCII characters in the range 32-127 except
' and "
-
Note that all attribute names are case insensitive, while the case
of values MUST be preserved. However, individual clauses and/or attributes
MAY define their values to be case-insensitive.
Comments
The PicsRULZ syntax, which will be presented below, has a facility for
descriptive text which can be shown to a user, in addition to various statements
which influence the behavior of user-agents. However, it is frequently
useful to have "source-level" comments - comments which are intended to
individuals writing and/or editing rules, but which are not intended for
display to end users. This is analogous to placing comments in source code;
in an effort to encourage rule authors to write clear rules, we provide
a facility for placing comments into PicsRULZ rules.
The syntax of a comment is:
comment:: '{' comment-text* '}'
comment-text:: any octets except '}'
Note that a result of the above syntax is that comments may not be nested.
Comments may appear anywhere in PicsRULZ rules. A user-agent MAY remove
the comments during lexical analysis of the rule; text within comments
MUST NOT influence the interpretation of the rule in any manner. Note also
that user-agents which generate generate or export PicsRULZ rules MAY choose
to strip out comments before generating, exporting, or transmitting them.
PicsRULZ Rules
The general format of a PicsRULZ rule, in modified BNF, is as follows:
rule :: '(' 'PicsRule-' verMajor '.' verMinor rule-body ')'
verMajor :: integer
verMinor :: integer
rule-body :: '(' rule-clauses ')'
rule-clauses :: rule-clause+
rule-clause :: filter-clause |
fail-clause |
pass-clause |
name-clause |
source-clause |
service-clause |
javelin-extension-clause |
opt-extension-clause |
req-extension-clause |
extension-clause
filter-clause :: 'Filter' '(' attrvalpair+ ')'
fail-clause :: 'failURL' '(' attrvalpair+ ')'
url-list :: quotedURL+
pass-clause :: 'passURL' '(' attrvalpair+ ')'
name-clause :: 'name' '(' attrvalpair+ ')'
source-clause :: 'source' '(' attrvalpair+ ')'
service-clause :: 'serviceinfo' '(' attrvalpair+ ')'
javelin-extension-clause :: 'ibm-javelin-extensions' '(' attrvalpair+ ')'
opt-extension-clause :: 'optextension' '(' attrvalpair+ ')'
req-extension-clause :: 'reqextension' '(' attrvalpair+ ')'
extension-clause :: extension-clause-name '(' attrvalpair+ ')'
Semantics & details of individual clauses
-
Filter
-
This clause specifies the expression to be evaluated to determine if this
rule will block access to a given URL. The syntax and semantics of the
values associated with these attributes are discussed below.
-
There are 2 attributes, pass and block, defined for the Filter
clause. The primary attribute is pass, and its default value is
"Unless-Prohibited". The block attribute has no default value. The
values for both of these attributes, if present MUST be quotedstrings;
in the syntax given below for expressions, it is assumed that the quotes
have been removed from the strings. These values must be quotedstrings
as their syntax does not fall into the attribute-value syntax that is used
for the rest of PicsRULZ rules.
-
FailURL
-
Rules may contain a list of URL prefixes which are to be explicitly blocked,
without even checking for labels on those documents. The FailURL
clause gives a list of quoted URL prefixes to block; any URL under consideration
which is a prefix of any URL in the url-list associated with a FailURL
clause will be blocked. This may seem to be outside the attribute-value
syntax laid down for general clauses, but it is a simple extension of it:
the only defined attribute for FailURL is URL, and its value
is a URL to be blocked. The URL attribute is allowed to occur multiple
times within a FailURL clause.
-
PassURL
-
Rules may contain a list of URL prefixes which are to be explicitly allowed,
without even checking for labels on those documents. The PassURL
clause gives a list of quoted URL prefixes to allow; any URL under consideration
which is a prefix of any URL in the url-list associated with a PassURL
clause will be allowed. This may seem to be outside the attribute-value
syntax laid down for general clauses, but it is a simple extension of it:
the only defined attribute for PassURL is URL, and its value
is a URL to be allowed. The URL attribute is allowed to occur multiple
times within a PassURL clause.
-
name
-
This clause provides a short, human-readable name for the rule being presented.
It is intended that these names could be shown on a user-agent's user interface,
to show a human operator which rules are loaded, active/inactive, etc.
-
There are 2 attributes, rulename and description, defined
for the name clause. Rulename is the primary attribute for
a name clause, and its value is the human-readable name of this
rule. The value for description is a more-detailed analogue of name;
it provides a human-readable description of the rule being presented. The
description is intended for display in a user-agent's user interface, to
allow a human operator to get some understanding of who created the rule,
its semantics, etc. The exact contents of the value associated with description
are left up to the rule author.
-
Note that this mechanism does not provide a transparent method for supporting
multiple national languages. This is intentionally not being addressed
in this version of PicsRULZ. If you wish to produce PicsRULZ-1.0 rules
in multiple languages, you will have to produce multiple copies of the
rule - one for each target language. We expect that this will be addressed
in a cleaner way in future versions of PicsRULZ.
-
source
-
This clause gives information about where the rule came from. There are
4 attributes defined for source: sourceURL, creationTool,
author, and lastModified. The primary attribute is sourceURL.
-
The sourceURL attribute gives the "rule's URL". It provides a location
where a human user of this rule can go to get more information about the
rule and/or its creator. The value of this attribute should be a URL here
a user can find a human-readable description of this rule.
-
The creationTool attribute gives the ability to identify the tool,
if any, that was used to create this rule. This is analagous to the User-Agent
string in HTTP. The value of the creationTool is a quoted string.
The string should be in the format toolname/version, as in
"Cool-PICS-Rule-Editor/1.04".
-
The author attribute gives the e-mail address of the individual
or organization who produced this rule. The value associated with this
attribute should be a quoted e-mail address.
-
The lastModified attribute gives the date and time that this rule
was last modified. The value must be a quoted-ISO-date, as defined in the
PICS-1.1 Label Syntax and Communication Protocols.
-
serviceinfo
-
This clause specifies information about a rating service. There are currently
6 attributes defined for serviceinfo: name, shortname,
bureauURL, ratfile, and defaultValue.
The primary attribute is name.
-
The name attribute is the servicename URL of a rating service. Its
value specifies the name of the service which is being described.
-
The shortname attribute gives a shorter name to this rating service.
The shortname will be used in writing filter clauses; its value
is a string. For example, for the rating service "http://coolness.raleigh.ibm.com/ratings/V1.html",
the shortname might be "Cool".
-
The bureauURL attribute specifies the URL of a label bureau that
has ratings from this rating service. The value for this attribute is the
URL of the label bureau; i.e., the URL to which label-bureau requests should
be sent. This attribute SHOULD be expected by user-agents to occur multiple
times.
-
The ratfile attribute presents the machine-readable rating system
description (also know as "RAT file") that is used by this rating service.
This may be specified in one of two ways: the value may be a quoted string
which contains the entire machine-readable service description, or it may
be of the syntax "[RATFile-URL]", where RATFile-URL is the
URL of the rating system description; a user-agent SHOULD assume that dereferencing
this URL will produce a document of type application/pics-service.
There is no default value for the ratfile attribute. If the quoted
string contains the machine-readable service description, then it SHOULD
be URL-encoded to escape quotes; in other words, double-quote characters
should be replaced with the string "%22".
-
The defaultValue attribute specifies a "default" value to be used
for categories in this rating service. The value for the defaultValue
attribute is called the "default value" for categories in this rating service.
The default value MUST NOT be used for documents for which no label is
available. The default value is only used when one or more labels are available
for a document, but the label(s) do not contain a value for a given category.
For example, if a rule calls for a "coolness" value for a document, and
the document includes an imbedded document which only gives a value for
the "graphics" category, then the defaultValue SHOULD be used instead.
The defaultValue construct has been created to reflect the fact
that there are existing rating services as of May 1997 which assume that
a default value will be used for omitted categories.
-
The IBM Javelin Proxy server version 1.0 defines one extension attribute
for the serviceinfo clause: available-with-content. Allowed
valued are "yes" and "no". This is used as a hint to the server to optimize
label retrieval. If the labels from a rating services are frequently available
with the content (indicated by a "yes" value for this attribute), then
a request to a label bureau (if any) will be delayed until after the content
is available and checked for labels; if the content contained a label from
this service, then no request will be sent to the label bureau. The default
value for available-with-content is "no".
-
ibm-javelin-extensions
-
This clause gives a set of extensions to PicsRULZ 1.0 that are implemented
by the IBM Javelin Proxy server version 1.0. There are 8 attributes defined
for ibm-javelin-extensions: use-expired, group-file,
applies-to, active-days, start-time, end-time,
active, and filter-local.
-
The use-expired attribute tells the rule evaluation engine whether
it can use expired labels to make a filtering decision. Allowed values
are "yes" and "no". A value of "yes" indicates that the rule evaluation
engine can use labels even if they have expired, while a value of "no"
(the default) indicates that expired labels are to be discarded by the
rule evaluation engine.
-
The group-file attribute specifies a group file, in standard ICS
format, to use for resolving users into groups. See the ICS 4.2 documentation
for a description of group files. The value to the group-file attribute
is the fully-qualified path of the group file in the local filesystem.
The group file must be stored locally.
-
The applies-to attribute indicates a set of user IDs, group IDs,
IP addresses, and/or hostnames of requesters that this rule applies to.
The default if this attribute is not specified is that the rule applies
to any request that comes through the proxy (but see the active
attribute below). The value for this attribute is specified in the same
syntax as is used for authentication rules in ICS.
-
The active-days, start-time, and end-time directives
allow a rule to be active for only certain time periods. An example might
be a rule in a corporate proxy server that applies during normal working
hours, or that only applies on weekends. The default values for these directives
are that the rule is active every day of the week, at all times during
the day.
-
The active-days directive gives a vector of days of the week for
which the rule is active. The vector MUST be exactly 7 characters long.
Position 0 applies to Sunday, position 1 to Monday, etc. A value of 0 in
a given position indicates that the rule is not active on that day of the
week, and a value of 1 indicates that it is active on that day of the week.
-
The start-time directive specifies the time of day that the rule
becomes active. The time is specified in 24-hour format, using HH:MM:SS
as the format.
-
The end-time directive specifies the time of day that the rule becomes
inactive. The time is specified in 24-hour format, using HH:MM:SS as the
format.
-
The active directive can be used to deactivate a rule. Allowed values
are "yes" and "no". The default is "yes". If a value of "no" is specified,
the rule is not active, and the server will not select this rule for any
requests. A value of "yes" indicates that the rule is eligible to be used
for incoming requests.
-
The filter-local directive indicates whether the proxy server should filter
requests for local files (requests that will be satisfied from the local
filesystem, but not from a proxy's cache). Allowed values are "yes" and
"no"; the default is "yes". If the value is "yes", the rule will automatically
pass all requests for local resources, while if the value is "no", the
rule will be applied normally for requests for local resources.
-
-
-
opt-extension-clause
-
opt-extension-clause and req-extension clause are the extension
mechanisms in PicsRULZ; they are modeled after the extension mechanism
in the PICS label format. More information on the extension mechanism is
given below.
-
-
The optextension clause has only one defined attribute: extension-name.
The value of the extension-name attribute specifies the name of
an extension that will be used by this rule. The name of the extension
is the quotedURL; this URL should point to a human-readable description
of this extension. URLs are used for extension names to insure uniqueness
without requiring a central naming body. If a user-agent receives a rule
which contains an optextension which it does not recognize, the
user-agent should process the rule, ignoring any clauses it does not recognize.
This means that any optional extensions MUST use the S-expression syntax
given above, so as to not break existing parsers.
-
Note that declaring the use of an optional extension may appear to be redundant,
as unrecognized attribute-value pairs are discarded by user-agents. The
purpose of the optextension construct is for use as a documentation mechanism.
User-agents MAY also display the names of optional extensions used by a
rule, asking the user for confirmation, before making use of a rule.
-
req-extension-clause
-
This clause has only one defined attribute: extension-name. The
value of the extension-name attribute specifies the name of an extension
that will be used by this rule. The name of the extension is the quotedURL;
this URL should point to a human-readable description of this extension.
URLs are used for extension names to insure uniqueness without requiring
a central naming body.
-
If a user-agent receives a rule which contains an reqextension which
it does not recognize, the user-agent should cease processing the rule
and discard it.
-
verMajor
-
The major version number of PicsRULZ which this rule conforms to. It is
expected that this version number will be '1' when this proposal is completed.
-
verMinor
-
The minor version number of PicsRULZ which this rule conforms to. It is
expected that this version number will be '0' when this proposal is completed.
Restrictions
The filter, name, and source clauses MUST NOT appear
more than once in a PicsRULZ rule. The FailURL, PassURL,
optextension, reqextension, and serviceinfo clauses
MAY appear more than once in a PicsRULZ rule. If FailURL appears multiple
times in a rule, the URL lists MUST be combined into a single list. The
same applies for multiple PassURL clauses.
PicsRULZ Filter Clauses
We define two attributes, Pass and Block, for a Filter
clause. Intuitively, if the URL is to be allowed, the available labels
must satisfy the permission condition and no available label can satisfy
the prohibition condition. In this section we define the syntax and semantics
of a Filter clause. The value for the Pass attribute is defined
by the nonterminal PassExpression, and the value for the Block
attribute is defined by the nonterminal BlockExpression:
PassExpression :: "Unless-Prohibited" | expression
BlockExpression :: expression
expression :: simple-expression | or-expression | and-expression
simple-expression :: '(' service '.' category op constant ')'
service :: any shortname defined in a serviceinfo clause within this rule
category :: any transmit-name for a category defined by the rating-system referred to by the matching system
op :: '>' | '<' | '=' | '!=' | '>=' | '=>' | '<=' | '=<' | 'all-equal' | 'none-equal' | 'includes'
constant :: [sign] alphanumchar ['.' alphanumchar]
or-expression :: '(' expression or expression [or expression]+ ')'
or :: 'or' | '||'
and-expression :: '(' expression and expression [and expression]+ ')'
and :: 'and' | '&&'
sign :: '-'
When evaluating a clause, the user-agent may use zero, one, or more labels
from a given rating service (for more details, see the control
flow section). A simple-expression evaluates to true if any
available label from the specified service satisfies the condition of the
expression.
We must deal with the situation where a simple-expression calls for
a value from a label, and either no label is available, or the available
labels do not have values for the specified category. In those situations,
the simple-expression evaluates to false. This leads to the
expected semantics: if a simple-expression has no associated label available,
that expression cannot contribute evidence toward either permitting or
prohibiting the URL.
If, for example, there is only a pass expression, which is just a single
simple-expression, no label for the pass expression means that the item
is not permitted (effectively, block unlabeled). If, instead, there is
only a block expression, which is just a simple-expression, no label means
that the item is not prohibited (allow unlabeled). These intuitive semantics
hold up even when the expressions are more complicated and even when both
a pass and a block expression are provided. We have explicitly decided
to omit the NOT operator because it would destroy these intuitive semantics
and make the handling of unlabeled situations very difficult for people
to think about.
Simple-expressions, as defined above, can use any types of operators
on any types of data. However, this is unimplementable and has debatable
semantics. Here we clarify the situation:
All of the operators defined in the op clause are valid on numeric,
single-valued categories. The semantics of each of the operators should
be obvious by inspection (except, perhaps, 'all-equal', 'none-equal', and
'includes' - but those will be explained in a moment); the result of applying
the operator will be a boolean value, true or false.
The operators 'all-equal', 'none-equal', and 'includes' are only meaningful
for categories which have the multivalue true attribute set. User-agents
MAY choose whatever interpretation they wish for categories which don't
have that attribute.
For categories which have the multivalue true attribute set, the
only allowed operators are '=', '!=', 'all-equal', 'none-equal', and 'includes'.
These operators are defined as follows:
-
'='
-
The '=' operator is defined to be true if the constant in
the expression matches any of the values for this category in the label
that's being used to evaluate this clause, and false otherwise.
-
'!='
-
The '!=' operator is defined to be true if it matches none
of the values for this category in the label that's being used to evaluate
this clause, and false otherwise.
-
'all-equal'
-
The 'all-equal' operator is defined to be true if the constant
in the expression matches all the values given for this category
in the label that's being used to evaluate this clause, and false
otherwise.
-
'none-equal'
-
The 'none-equal' operator is identical to the '!=' operator.
-
'includes'
-
The 'includes' operator is identical to the '=' operator.
The only operators defined on string-valued categories are '=' and '!='.
The operator '=' returns true if two strings are equal by
octet comparison, and false otherwise. '!=' returns the opposite:
false if two strings are equal by octet comparison, and true
otherwise.
A URL passes this filter if the pass-expression evaluates to true
and the block-expression evaluates to false. If the
pass-expression is "Unless Prohibited", it evaluates true,
allowing everything that is not prohibited. If no block-expression
is provided, its value defaults to false, allowing everything
that is explicitly permitted. Note that if no pass-expression is
given, its default value was defined above as "Unless-Prohibited".
Order of operations
The clauses in a PicsRULZ rule MUST be evaluated in the following order:
-
Compare the URL against the URLs listed in any FailURL clauses in the rule.
-
Compare the URL against the URLs listed in any PassURL clauses in the rule.
-
Evaluate the expression given in the Filter clause.
The outcome of this is that FailURL has highest precedence; if a URL matches
both a FailURL and a PassURL, then the document will be blocked.
Control Flow
-
The rule syntax and semantics given above define what can be placed in
a rule, and the meaning of those constructs. In order to process these
rules, a user-agent SHOULD adopt an internal data-flow as described here;
this will ease the implementation of expected extensions to PicsRULZ when
they become formalized.
-
The standard user-agent which processes PicsRULZ rules SHOULD have four
significant components: the rule parser, the label source,
label validators, and a rule evaluator. Their roles are:
-
Rule parser
-
Parses PicsRULZ rules, possibly loaded from saved configuration information
or over a network. In user-agents which may store multiple rules, such
as proxy servers, this component is also responsible for deciding which
rule to use for each specific request; subsequent modules assume that only
one rule is being applied.
-
Label source
-
This component is responsible for finding labels. It takes as input information
from the rule being evaluated; it MAY use this information to contact label
bureaus for labels. It MAY also find labels imbedded in HTML documents
or transmitted in datastreams (HTTP, NNTP) which support label transmission.
The output of this component is the set of labels which apply to the document
in question. Note that as there are multiple potential label sources, the
label source component may produce more than one label from a given service
for a given document. However, the label source component is responsible
for choosing the "most applicable" label (i.e., picking specific labels
over generic ones, and picking the most specific generic label if multiple
generic labels are available). This component will also have to understand
"default" labels should that proposal be adopted.
-
The label source will need to specify to the other components not only
the label itself, but also how the label was obtained (imbedded in content,
from a label bureau, etc).
-
Label validators
-
Label validators are responsible for determining which labels are acceptable.
No validators are defined in the PicsRULZ language, but we expect extensions
to be defined. For example, a label validator may be defined which rejects
labels that lack an authorized digital signature. Another possible validator
would examine whether a label's author has been vouched for by a trusted
third party.
-
Rule evaluator
-
The rule evaluator takes as input the labels that pass any validators,
and the Filter expression that the rule parser found in the rule. It evaluates
the permission and prohibition expressions and produces a pass/fail decision.
Each simple-expression is evaluated by checking if any valid label satisfies
the expression. As described above, any simple-expression for which no
valid label applies evaluates to "false"
Extension mechanism
-
Any network protocol needs a mechanism for extension. Here we present the
extension mechanism provided with PicsRULZ.
Background
-
PicsRULZ is structured as a nested set of attribute-value pairs. Unrecognized
attribute keywords are ignored by user-agents, and the associated values
can be discarded by a PicsRULZ parser, as all values will be in a known
syntax. The basic mechanism for extending PicsRULZ is to define new clauses
and/or attribute-value pairs, their context, and their meaning. All new
attribute-value pairs will be associated with a named extension. Names
of extensions will be unique by building atop DNS names; the attribute-value
pairs used by an extension will not be required to have globally-unique
names.
Details
-
To define a new extension:
-
Determine if the extension is optional or required. Optional extensions
may be ignored by user-agents which don't recognize the extension.
In order for an extension to be "optional", the semantics of a rule which
uses this extension must not be modified if the extension is ignored.
-
Name the extension. Extensions must have a unique URL assigned to
them. The URL should point to a human-readable document which explains
the extension in detail. The URL must be in a domain controlled
by the extension's creator, in order to insure uniqueness of extension
names.
-
If an extension needs new clauses, define the extension-clause-name that
will be used for each new clause defined by this extension. Extensions
SHOULD define no more than one new clause.
-
Determine the new attribute-value pairs that this extension will define,
and which clauses those attribute-value pairs may appear in.
-
Define the semantics of each new attribute-value pair defined by this extension.
In particular, if this extension overrides existing parts of PicsRULZ,
then this behavior must be spelled out exactly.
Here's a simple example of a PicsRULZ rule that uses an optional extension:
1 (PicsRule-1.0
2 (
3 servicename (
4 "http://www.coolness.raleigh.ibm.com/ratings/V1.html"
5 shortname "Cool"
6 bureauURL "http://www.ics.raleigh.ibm.com/Coolness")
7 Filter (Pass '((Cool.Coolness < 3) or (Cool.Graphics < 3))' )
8 optextension ("http://www.ics.raleigh.ibm.com/ICS/ICS42-extensions.html")
9 ibm-ics-extension (use-expired "YES" group-file "/etc/ics.grp")
10 )
11 )
This example makes use of an optional extension named "http://www.ics.raleigh.ibm.com/ICS/ICS42-extensions.html".
That extension presumably defines the keyword 'ibm-ics-extension',
which is given on line 9. User-agents which don't understand this extension
can simply ignore the ibm-ics-extension clause and its attribute-value
pairs.
Note that there is only one "level" to declaring extensions,
but attribute-value pairs defined by extensions may appear anywhere within
a PicsRULZ rule. That is, all extensions should declare themselves with
an optextension or reqextension clause within a rule-clause,
but the attributes defined by an extension may appear nested several layers
down within a rule. This is shown in the following, more elaborate, example:
1 (PicsRule-1.0
2 (
3 servicename (
4 "http://www.coolness.raleigh.ibm.com/ratings/V1.html"
5 shortname "Cool"
6 bureauURL "http://www.ics.raleigh.ibm.com/Coolness")
7 Filter (Block '((Cool.Coolness < 3) or (Cool.Graphics < 3))'
ibm-ics-time "06:00:00-20:00:00")
8 optextension ("http://www.ics.raleigh.ibm.com/ICS/ICS43-extensions.html")
9 )
10 )
In this example, the extension named "http://www.ics.raleigh.ibm.com/ICS/ICS43-extensions.html"
is defined. It presumably defines a new attribute-value pair within a Filter
clause. The attribute is named ibm-ics-time; it uses a single quoted
string as its value, but it could use an entire S-expression as its value.
Please mail any comments on this draft to the members of the working
group; this list is available
on the Web.
Last modified: 4/22/1998