This section explains the
style.tde
syntax used to supply field extraction rules.style.tde
syntax template is followed by descriptions of the style.tde
statements in alphabetical order. A statement is any word that may appear at the beginning of a line and is always immediately followed by a colon (:). Syntax Template
The following syntax template includes the style.tde
syntax relevant to mkvdk for extracting field values.
- $control: 1
- tde:
- {
- pre-process:
- {
- relative-path: yes|no
- }
- {
- datamap:
- /docsep =
"
pattern
"- /filter = "
filter_name
"- /system = "
system_call
"- /charmap =
charmap_code
- {
- define:
pattern_name
"pattern
"- ...
- field:
field_name
FILENAME|TIME|FILETIME|- FILESIZE|FILEPATH|PATTERN|LINE
num|
"
pattern"
- /required = yes|no
- /which = [1|###|LAST|ALL]
- /string-before = "
string
"- /string-between = "
string
"- /string-after = "
string
"- /default = "
field_value
"- /alsowrite =
[field_name|"field_names"]
- ...
- dispatch:
field_name
- /required = yes|no
- /start-line = "
string
"- /start-pattern = "
string
"- /end-line = "
string
"- /end-pattern = "
string
"- /inclusive = yes|no
- ...
- }
- }
- }
- $$
style.tde
file.
datamap
section of the style.tde
file defines the document body text and the rules for populating field values from documents. The document body text is used to create a collection's full-word index. Field population rules populate fields in the collection's documents table, as defined by the style.ddd
and style.ufl
files.The following statements may appear as children of
datamap
: define, dispatch, field. define, dispatch, and field are described later in this section.
- datamap:
- /docsep =
"
pattern
"- /filter = "
filter_name
"- /system = "
system_call
"- /charmap =
charmap_code
- {
datamapping
- }
datamap
statement. It assigns a name to a regular expression. Use this name to represent the expression elsewhere in the datamap
section of the style.tde
file.
pattern_name
"pattern
"
style.
ddd or style.ufl
file. It identifies the document body text to include in the full-word index. If there is no style.dft
file, the document body text, as specified by the the dispatch
statement, is displayed for viewing.The start and end of the document body can be identified by a line number or a pattern written as a regular expression.
- dispatch:
field_name
- /required =
yes
|no
- /start-line = "
string
"- /start-pattern =
"
string
"- /end-line =
"
string
"- /end-pattern =
"
string
"
- /inclusive = yes|no
style.ufl
file. A field
keyword must be present for each field for which you want to store extracted values.
- field:
field_name
{FILENAME|FILEPATH|FILETIME|FILESIZE| LINE|PATTERN|TIME}num
|"pattern
"- /required =
yes
|no
- /which = [1|###|LAST|ALL]
- /string-before = "
string
"- /string-between = "
string
"- /string-after = "
string
"- /default = "
field_value
"- /alsowrite = [
field_name
|"field_names
"]
style.tde
file. Note that no more than one pre-process statement should be included in the style.tde
file. Each pre-process statement can have up to 1000 child statements.
- pre-process:
- {
pre-processing
- }
style.tde
file. It should be the first non-comment line after the $control statement.
- # style.tde example
- $control: 1
- tde:
- {
- pre-process:
- {
- datamap:
- {
- define: writename "<E.*>"
- field: Writer PATTERN "{writename}"
- /required = yes
- dispatch: DOC
- }
- }
- }
- $$
- # style.ufl field definition to be used with style.tde above
- data-table: dad
- {
- varwidth: Writer dxa
- }