Data Format Description Language (DFDL) v1.0 Specification
OGF Proposed Recommendation GFD-P-R.174, January 31, 2011


9.1 Parser Overview

The DFDL logical parser is a recursive-descent parser [RDP] having guided, but potentially unbounded look ahead that is used to resolve points of uncertainty. (See ‎9.1.1 Resolving Points of Uncertainty.). A DFDL parser reads a specification (the DFDL schema) and it recursively walks down and up the schema as it processes the data. This is done in a manner consistent with the scoping of properties and variables described in Section ‎8 Property Scoping Rules.

The unbounded look ahead means that there are situations where the parser must speculatively attempt to parse data where the occurrence of a processing error causes the parser to suppress the error, back out and make another attempt.

Implementations of DFDL may provide control mechanisms for limiting the speculative search behavior of DFDL parsers. The nature of these mechanisms is beyond the scope of the DFDL specification which defines the behavior of conforming parsers only on correct data. That is, data that can be parsed without any effective processing errors.

The logical parser recursively descends the DFDL schema beginning with the element declaration specified (in an implementation specific manner, see Section ‎18) of the distinguished root node of the schema passed to the DFDL processor. Depending on the kind of schema construct encountered and the DFDL annotations on it, and the pre-existing context, the parser performs specific parsing operations on the data stream. These parsing operations typically recognize and consume data from the stream and construct values in the logical model. For values of complex types and for arrays, these logical model values may incorporate values created by recursive parsing.

DFDL Implementations are free to use whatever techniques for parsing they wish so long as the semantics are equivalent to that of the speculative recursive-descent logical parser described in this specification. It is required that implementations distinguish the various kinds of errors (schema definition error, processing error, etc.) no matter what time they are detected. Some implementations may not detect certain schema definition errors until data are being parsed; however, they must still distinguish schema definition errors (which indicate that the schema itself is not meaningful), from parsing errors (which indicate that the input data doesn’t satisfy the requirements of the schema), or unparsing errors (which indicate that the infoset does not satisfy the requirements of the schema).

9.1.1 Resolving Points of Uncertainty

A point of uncertainty occurs in the data stream when there is more than one schema component that might occur at that point. Points of uncertainty can be nested.

A point of uncertainty is caused when one of the following constructs is used in a DFDL schema

  1. An xs:choice

  2. An unordered xs:sequence (dfdl:sequenceKind='unordered')

  3. An xs:element which is optional (xs:minOccurs = 0, xs:maxOccurs=1)

  4. An xs:element is an array with a variable number of occurrences (xs:minOccurs not equal to xs:maxOccurs, and xs:maxOccurs > 1)

  5. An xs:sequence containing one or more floating elements.

An xs:choice point of uncertainty is resolved by parsing each choice branch in schema definition order until one is known to exist. It is a processing error if none of the choice branches are known to exist.

An unordered xs:sequence point of uncertainty is resolved by parsing for the child components of the sequence in schema definition order at each point in the data stream where a component can exist until the required number of each child components is known to exist or the sequence is terminated by delimiters or specified length.

An optional element point of uncertainty is resolved by parsing the element until it is either known to exist or known not to exist.

For an array element with a variable number of occurrences. the point of uncertainty is resolved for each occurrence separately. The array is known to exist if one of its occurrences exists.

A sequence with a floating child element point of uncertainty is resolved by parsing for the expected ordered component at that point in the data stream. If the expected component is known not to exist then an instance of each floating component is parsed in schema definition order.

A component is known to exist when

  1. All the syntax and content (initiator if defined, content and terminator if defined) of the component are successfully parsed and any dfdl:assert if defined evaluates to true.

  2. A dfdl:discriminator on the component evaluates to true.

  3. A xs:sequence or xs:choice with dfdl:initiatedContent 'yes' and initiator for the component is found

A component known not to exist when

  1. A dfdl:assert on the component evaluates to false or a processing error occurs while evaluating the expression.

  2. A dfdl:discriminator on the component evaluates to false or a processing error occurs while evaluating the expression.

  3. An xs:sequence or xs:choice with dfdl:initiatedContent 'yes' and initiator is not found.

  4. A processing error occurs when parsing the component. Processing errors include, but are not limited to, failure to convert the data to the built-in logical type. Validation errors do not cause a component to be known not to exist.

DFDL discriminators are described in section: ‎7.4 The dfdl:discriminator Annotation Element


Copyright (C) Open Grid Forum (2005-2010). All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the OGF or other organizations, except as needed for the purpose of developing Grid Recommendations in which case the procedures for copyrights defined in the OGF Document process must be followed, or as required to translate it into languages other than English.