Data Format Description Language (DFDL) v1.0 Specification
OGF Proposed Recommendation GFD-P-R.174, January 31, 2011
Properties on DFDL annotations may be one or more of the following types
The property value is a string that represents a sequence of literal bytes or characters which appear in the data stream.
The property value is a DFDL subset XPath 2.0 expression that returns a value derived from other property values and/or from the DFDL infoset.
The property value is a regular expression that can be used as a pattern to calculate the length of an element by applying that pattern to the sequence of literal bytes or characters which appear in the data stream.
The property value is one of the allowed values listed in the property description. An enumeration is of type string unless otherwise stated.
The property value is a string that describes a logical value. The type of the logical value is one of the XML schema simple types.
The property value is an XML Qualified Name as specified in “Namespaces in XML “ [XMLNS10]
Some properties accept a list or union of types
The property value is a space-separated list of the specified type. When parsing, if more than one string literal in the list matches the portion of the data stream being evaluated then the longest matching value in the list must be used. When unparsing, the first value in the list must be used
The property value is a union of DFDL expression and exactly one of the other types. The expression must resolve to a value of the other type.
The property value is a union of two or more types. The type is dependent on the value of another property. For example dfdl:nilValue can be a List of DFDL String Literals or a List of Strings depending on dfdl:nilKind
DFDL String Literals represent a sequence of literal bytes or characters which appear in the data stream. This presents the following challenges
The DFDL specification defines a language for describing any arbitrary sequence of bytes and characters. The full grammar is supplied in Appendix E, but the essential details are given below.
A DFDL string literal can describe any of the following types of literal data in any combination:
6.3.1.1 Character strings in DFDL String Literals
A literal string in a DFDL Schema is written in the character set encoding specified by the XML directive that begins all XML documents:<?xml version="1.0" encoding="UTF-8" ?>
In this example, the DFDL schema is written in UTF-8, so any literal strings contained in it, and particularly string literals found in its representation property bindings in the format annotations, are expressed in UTF-8.
<?xml version="1.0" encoding="UTF-8" ?>
....
....
....<dfdl:format encoding='ebcdic-cp-us' separator=","/>
.....
When a DFDL processor uses the separator expressed in this manner, the string literal "," is translated into the character set encoding of the data it is separating as specified by the encoding representation property. Hence, in this case we would be searching the data for a character with codepoint 0x6B (the EBCDIC comma), not a UTF-8 or Unicode (0x2C) comma which is what exists in the DFDL schema document file.
Character strings can include bidirectional data.
6.3.1.2 DFDL Character Entities in String Literals
DFDL character entities specify a single Unicode character and provides a convenient way to specify code points that appear in the data stream but would be difficult to specify in XML strings. For example, common non-printable characters or code points, such as 0x00, that are not valid in XML documents. DFDL entities are based on XML entities, which can also be used in a DFDL schema.DfdlCharEntity |
::= |
DfdlEntity | DecimalCodePoint | HexadecimalCodePoint |
DfdlEntity |
::= |
'%' DfdlEntityName ';' |
DfdlEntityName |
::= |
'NUL'|'SOH''|'STX'|'ETX'| 'EOT'|'ENQ'|'ACK'|'BEL'| 'BS'|'HT'|'LF'|'VT'|'FF'| 'CR'|'SO'|'SI'|'DLE'| 'DC1'|'DC2'|'DC3'|'DC4'| 'NAK'|'SYN'|'ETB'|'CAN'| 'EM'|'SUB'|'ESC'|'FS'| 'GS'|'RS'|'US'|'SP'| 'DEL'|'NBSP'|'NEL'|'LS' |
DecimalCodePoint |
::= |
'%#' [0-9]+ ';' |
HexadecimalCodePoint |
::= |
'%#x' [0-9a-fA-F]+ ';' |
%% - Inserts a single literal "%" into the string literal. This "%" is subject to character set translation as is any other character.
A HexadecimalCodePoint provides a hexadecimal representation of the character's code point in ISO/IEC 10646.
A DecimalCodePoint provides a decimal representation of the character's code point in ISO/IEC 10646.
A dfdlEntityName one of the mnemonics given in the following tables.
Mnemonic |
Meaning |
Unicode value |
---|---|---|
NUL |
null |
U+0000 |
SOH |
start of heading |
U+0001 |
STX |
start of text |
U+0002 |
ETX |
end of text |
U+0003 |
EOT |
end of transmission |
U+0004 |
ENQ |
enquiry |
U+0005 |
ACK |
acknowledge |
U+0006 |
BEL |
bell |
U+0007 |
BS |
backspace |
U+0008 |
HT |
horizontal tab |
U+0009 |
LF |
line feed |
U+000A |
VT |
vertical tab |
U+000B |
FF |
form feed |
U+000C |
CR |
carriage return |
U+000D |
SO |
shift out |
U+000E |
SI |
shift in |
U+000F |
DLE |
data link escape |
U+0010 |
DC1 |
device control 1 |
U+0011 |
DC2 |
device control 2 |
U+0012 |
DC3 |
device control 3 |
U+0013 |
DC4 |
device control 4 |
U+0014 |
NAK |
negative acknowledge |
U+0015 |
SYN |
synchronous idle |
U+0016 |
ETB |
end of transmission block |
U+0017 |
CAN |
cancel |
U+0018 |
EM |
end of medium |
U+0019 |
SUB |
substitute |
U+001A |
ESC |
escape |
U+001B |
FS |
file separator |
U+001C |
GS |
group separator |
U+001D |
RS |
record separator |
U+001E |
US |
unit separator |
U+001F |
SP |
space |
U+0020 |
DEL |
delete |
U+007F |
NBSP |
no break space |
U+00A0 |
NEL |
Next line |
U+0085 |
LS |
Line separator |
U+2028 |
6.3.1.3 DFDL Character Classes Entities in DFDL String Literals
The following DFDL character classes are provided to specify one or more characters from a set of related characters.
DfdlCharClass |
::= |
'%' DfdlCharClassName ';' |
DfdlCharClassName |
::= |
'NL' | 'WSP' | 'WSP*' | 'WSP+' | 'ES' |
Mnemonic |
Meaning |
Unicode value |
---|---|---|
NL |
Newline On parse any NL character or combination of characters On unparse the value of the dfdl:outputNewLine property is output |
|
WSP |
Single whitespace On parse any white space character On unparse a space (U+0020) is output |
• U0009-U000D (Control characters) • U0020 SPACE • U0085 NEL • U00A0 NBSP • U1680 OGHAM SPACE MARK • U180E MONGOLIAN VOWEL SEPARATOR • U2000-U200A (different sorts of spaces) • U2028 LSP • U2029 PSP • U202F NARROW NBSP • U205F MEDIUM MATHEMATICAL SPACE • U3000 IDEOGRAPHIC SPACE |
WSP* |
Optional Whitespaces On parse whitespace characters are ignored On unparse nothing is output |
Same as WSP |
WSP+ |
Whitespaces On parse one or more whitespace characters are ignored. It is an processing error if no whitespace character is found On unparse a space (U+0020) is output |
Same as WSP |
ES |
Empty String Used in space separated lists when empty string is one of the values (may only be used for the nilValue property) |
Using these DFDL entities one can create string literals which are a mix of text and hex-specified data.
6.3.1.4 DFDL Byte Value Entities in DFDL String Literals
DFDL byte value entities provide a way to specify a single byte as it appears in the data stream without any character set translation. To specify a string of byte values, a sequence of two or more byte value entities must be used.
ByteValue |
::= |
'%#r' [0-9a-fA-F]{2} ';' |
Some DFDL properties allow DFDL expressions [ see section 23 Expression language] to be used so that the property can be set dynamically at processing-time.
The general syntax of expressions is “{“ expression “}”
The rules for recognizing DFDL expressions are
Must start with a '{' in the first position and end with '}' in the last position.
'{' in any other position if treated as a literal
'}' in any position other than the last position is treated as a literal.
'{{' as the first characters are treated as the literal '{' and not a DFDL expression.
DFDL expressions reference other items in the infoset or augmented infoset using absolute or relative paths. Relative paths are evaluated when the component containing the expression is referenced not when it is declared. For example a global element may have a DFDL property which is an expression that contains a relative path to another element. The relative path is evaluated when the global element is referenced from an element reference.
DFDL expressions that are used to provide the value of DFDL properties in the dfd:format annotation on the xs:schema MAY NOT contain relative paths.
The DFDL lengthPattern property expects a regular expression to be specified. The DFDL Regular Expression language is defined in the section 24 DFDL Regular Expressions.
Copyright (C) Open Grid Forum (2005-2010). All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the OGF or other organizations, except as needed for the purpose of developing Grid Recommendations in which case the procedures for copyrights defined in the OGF Document process must be followed, or as required to translate it into languages other than English.