Data Format Description Language (DFDL) v1.0 Specification
OGF Proposed Recommendation GFD-P-R.174, January 31, 2011


6.3 DFDL Properties

Properties on DFDL annotations may be one or more of the following types

Some properties accept a list or union of types

6.3.1 DFDL String Literals

DFDL String Literals represent a sequence of literal bytes or characters which appear in the data stream. This presents the following challenges

The DFDL specification defines a language for describing any arbitrary sequence of bytes and characters. The full grammar is supplied in Appendix E, but the essential details are given below.

A DFDL string literal can describe any of the following types of literal data in any combination:

6.3.1.1 Character strings in DFDL String Literals

A literal string in a DFDL Schema is written in the character set encoding specified by the XML directive that begins all XML documents:

<?xml version="1.0" encoding="UTF-8" ?>

In this example, the DFDL schema is written in UTF-8, so any literal strings contained in it, and particularly string literals found in its representation property bindings in the format annotations, are expressed in UTF-8.

However, these strings are being used to describe features of text data that are commonly in other character sets. For example, we may have EBCDIC data that is comma separated. A comma in EBCDIC does not have the same character code as a Unicode comma. However, when we indicate that an item is "," (comma) separated and we specify this using a string literal along with specifying the 'encoding' property to be 'ebcdic-cp-us' then this means that the data are separated by EBCDIC commas regardless of what character set encoding is used to write the DFDL Schema.
<?xml version="1.0" encoding="UTF-8" ?>
....
....
....<dfdl:format encoding='ebcdic-cp-us' separator=","/>
.....

When a DFDL processor uses the separator expressed in this manner, the string literal "," is translated into the character set encoding of the data it is separating as specified by the encoding representation property. Hence, in this case we would be searching the data for a character with codepoint 0x6B (the EBCDIC comma), not a UTF-8 or Unicode (0x2C) comma which is what exists in the DFDL schema document file.

Character strings can include bidirectional data.

6.3.1.2 DFDL Character Entities in String Literals

DFDL character entities specify a single Unicode character and provides a convenient way to specify code points that appear in the data stream but would be difficult to specify in XML strings. For example, common non-printable characters or code points, such as 0x00, that are not valid in XML documents. DFDL entities are based on XML entities, which can also be used in a DFDL schema.

DfdlCharEntity

::=

DfdlEntity |

DecimalCodePoint |

HexadecimalCodePoint

DfdlEntity

::=

'%' DfdlEntityName ';'

DfdlEntityName

::=

'NUL'|'SOH''|'STX'|'ETX'|

'EOT'|'ENQ'|'ACK'|'BEL'|

'BS'|'HT'|'LF'|'VT'|'FF'|

'CR'|'SO'|'SI'|'DLE'|

'DC1'|'DC2'|'DC3'|'DC4'|

'NAK'|'SYN'|'ETB'|'CAN'|

'EM'|'SUB'|'ESC'|'FS'|

'GS'|'RS'|'US'|'SP'|

'DEL'|'NBSP'|'NEL'|'LS'

DecimalCodePoint

::=

'%#' [0-9]+ ';'

HexadecimalCodePoint

::=

'%#x' [0-9a-fA-F]+ ';'

Table 2 DFDL Character Entity syntax

 

%% - Inserts a single literal "%" into the string literal. This "%" is subject to character set translation as is any other character.

A HexadecimalCodePoint provides a hexadecimal representation of the character's code point in ISO/IEC 10646.

A DecimalCodePoint provides a decimal representation of the character's code point in ISO/IEC 10646.

A dfdlEntityName one of the mnemonics given in the following tables.

Mnemonic

Meaning

Unicode value

NUL

null

U+0000

SOH

start of heading

U+0001

STX

start of text

U+0002

ETX

end of text

U+0003

EOT

end of transmission

U+0004

ENQ

enquiry

U+0005

ACK

acknowledge

U+0006

BEL

bell

U+0007

BS

backspace

U+0008

HT

horizontal tab

U+0009

LF

line feed

U+000A

VT

vertical tab

U+000B

FF

form feed

U+000C

CR

carriage return

U+000D

SO

shift out

U+000E

SI

shift in

U+000F

DLE

data link escape

U+0010

DC1

device control 1

U+0011

DC2

device control 2

U+0012

DC3

device control 3

U+0013

DC4

device control 4

U+0014

NAK

negative acknowledge

U+0015

SYN

synchronous idle

U+0016

ETB

end of transmission block

U+0017

CAN

cancel

U+0018

EM

end of medium

U+0019

SUB

substitute

U+001A

ESC

escape

U+001B

FS

file separator

U+001C

GS

group separator

U+001D

RS

record separator

U+001E

US

unit separator

U+001F

SP

space

U+0020

DEL

delete

U+007F

NBSP

no break space

U+00A0

NEL

Next line

U+0085

LS

Line separator

U+2028

Table 3 DFDL Entities

 

6.3.1.3 DFDL Character Classes Entities in DFDL String Literals

The following DFDL character classes are provided to specify one or more characters from a set of related characters.

DfdlCharClass

::=

'%' DfdlCharClassName ';'

DfdlCharClassName

::=

'NL' | 'WSP' | 'WSP*' | 'WSP+' | 'ES'

Table 4 DFDL Character Class Entity syntax

 

Mnemonic

Meaning

Unicode value

NL

Newline

On parse any NL character or combination of characters

On unparse the value of the dfdl:outputNewLine property is output

  • U+000A LF

  • U+000D CR

  • U+000D U+000A CRLF

  • U+0085 NEL

  • U+2028 LS

WSP

Single whitespace

On parse any white space character

On unparse a space (U+0020) is output

• U0009-U000D (Control characters)

• U0020 SPACE

• U0085 NEL

• U00A0 NBSP

• U1680 OGHAM SPACE MARK

• U180E MONGOLIAN VOWEL SEPARATOR

• U2000-U200A (different sorts of spaces)

• U2028 LSP

• U2029 PSP

• U202F NARROW NBSP

• U205F MEDIUM MATHEMATICAL SPACE

• U3000 IDEOGRAPHIC SPACE

WSP*

Optional Whitespaces

On parse whitespace characters are ignored

On unparse nothing is output

Same as WSP

WSP+

Whitespaces

On parse one or more whitespace characters are ignored. It is an processing error if no whitespace character is found

On unparse a space (U+0020) is output

Same as WSP

ES

Empty String

Used in space separated lists when empty string is one of the values

(may only be used for the nilValue property)

 
Table 5 DFDL Generic Entities

 

Using these DFDL entities one can create string literals which are a mix of text and hex-specified data.

6.3.1.4 DFDL Byte Value Entities in DFDL String Literals

DFDL byte value entities provide a way to specify a single byte as it appears in the data stream without any character set translation. To specify a string of byte values, a sequence of two or more byte value entities must be used.

ByteValue

::=

'%#r' [0-9a-fA-F]{2} ';'

Table 6 DFDL Byte Value Entity syntax

 

6.3.2 DFDL Expressions

Some DFDL properties allow DFDL expressions [ see section ‎23 Expression language] to be used so that the property can be set dynamically at processing-time.

The general syntax of expressions is “{“ expression “}”

The rules for recognizing DFDL expressions are

DFDL expressions reference other items in the infoset or augmented infoset using absolute or relative paths. Relative paths are evaluated when the component containing the expression is referenced not when it is declared. For example a global element may have a DFDL property which is an expression that contains a relative path to another element. The relative path is evaluated when the global element is referenced from an element reference.

DFDL expressions that are used to provide the value of DFDL properties in the dfd:format annotation on the xs:schema MAY NOT contain relative paths.

6.3.3 DFDL Regular Expressions

The DFDL lengthPattern property expects a regular expression to be specified. The DFDL Regular Expression language is defined in the section 24 DFDL Regular Expressions.

6.3.4 Enumerations in DFDL

Some DFDL properties accept an enumerated list of valid values. It is a schema definition error if a value other than one of the enumerated values is specified. The case of the specified value must match the enumeration. An enumeration is of type string unless otherwise stated.

Copyright (C) Open Grid Forum (2005-2010). All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the OGF or other organizations, except as needed for the purpose of developing Grid Recommendations in which case the procedures for copyrights defined in the OGF Document process must be followed, or as required to translate it into languages other than English.