Data Format Description Language (DFDL) v1.0 Specification
OGF Proposed Recommendation GFD-P-R.174, January 31, 2011
The goals of version 1.0 are as follows:
- Leverage XML technology and concepts
- Support very efficient parsers/formatters
- Avoid features that require unnecessary data copying
- Support round-tripping, that is, read and write data in a described
format from the same description
- Keep simple cases simple
- Simple descriptions should be "human readable" to the same degree
that XSDL is.
The general features of version 1.0 are as follows:
- Text and binary data parsing and unparsing
- Validate the data when parsing and unparsing using XSDL validation.
- Defaulted input and output for missing values
- Reference – use of a previously read value in subsequent expressions
- Choice – capability to select among format variations
- Hidden sequence of elements - description of an intermediate representation
not exposed in the final result
- Basic Math – in DFDL expressions
- Out-of-band value handling
- Speculative parsing to resolve uncertainty.
- Very general parsing capability: Lookahead/Push-back
Version 1.0 of DFDL is a language capable of expressing
a wide range of binary and text-based data formats.
DFDL is
capable of describing binary data as found in the data structures
of COBOL, C, PL1, Fortran, etc. In particular, it is able to describe
repeating sub-arrays where the length of an array is stored in another
location of the structure.
DFDL is capable of describing a wide
variety of textual data formats such as HL7, X12, and SWIFT. Textual
data formats often use syntax delimiters, such as initiators, separators
and terminators to delimit fields.
DFDL has certain composition
properties. I.e., two formats can be nested or concatenated and a
working format results.
The following topics have been deferred
to future versions of the standard:
- Extensibility: There are real examples of proprietary data format
description languages that we use as our base of experience from which
to derive standard DFDL. However, there are no examples of extensible
format description languages. Therefore, while extensibility is desirable
in DFDL, there is not yet a base of experience with extensibility
from which to derive a standard.
- Rich Layering: Some formats require data to be described in multiple
passes. Combining these into one DFDL schema requires very rich layering
functionality. In these layers one element's value content becomes
the representation of another element. DFDL V1.0 allows description
of only a limited kind of layering.
Copyright (C) Open Grid Forum (2005-2010). All Rights
Reserved.
This document and translations of it may be copied
and furnished to others, and derivative works that comment on or otherwise
explain it or assist in its implementation may be prepared, copied,
published and distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice and this paragraph
are included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the OGF or other organizations,
except as needed for the purpose of developing Grid Recommendations
in which case the procedures for copyrights defined in the OGF Document
process must be followed, or as required to translate it into languages
other than English.