Data Format Description Language (DFDL) v1.0 Specification
OGF Proposed Recommendation GFD-P-R.174, January 31, 2011


3. Glossary

Adjacent - Two parts of the input/output stream are adjacent if they are at consecutive addresses.

Addressable Unit, or Unit - This is the unit of storage that makes up the input or output stream holding the representation of the data. The units are bits, bytes, or characters.

Applicable properties - All the DFDL properties that apply to that type of schema construct. For example all the DFDL properties that apply to an xs:simpleType.

Array - The set of adjacent elements whose XSDL element declaration specifies the potential for it to have more than one occurrence (xs:maxOccurs > 1 or unbounded). Of course any given array instance can have any number of elements, including zero elements or exactly 1 element as long as the occurrence constraints are met. If xs:maxOccurs is 'unbounded' then there is no constraint to the maximum number of occurrences. An optional element (xs:maxOccurs=1, xs:minOccurs=0) is not considered to be an array as described in this document. (The term for any variable-occurrence item, generalizing the notion of variable- occurrence array and optional element is 'variable-occurrence item'.) Note that a sequence is not to be confused with an array. A sequence is a complex tuple type for an element; the children of a sequence can be of different types. All elements of an array have the same type and have the same information item members except for the value member.

Array Element – an element declaration or reference with xs:maxOccurs>1 or unbounded.

Augmented infoset - When unparsing one begins with the DFDL schema and conceptually with the logical infoset. As the values of items are filled in by defaulting, and by use of the DFDL outputValueCalc property (including on hidden items), these new item values augment the infoset. The resulting infoset is called the augmented infoset.

Byte - The term “byte” refers to an 8-bit octet.

Component - A construct within a DFDL schema that may contain a DFDL annotation..

Content - The content is the bits of data that are interpreted to compute a logical value

Contiguous - An element has a contiguous representation if all parts of its representation are adjacent in the input/output stream. Most simple types have contiguous representations naturally. Groups containing elements that are themselves contiguous are also considered to have contiguous representations irrespective of alignment fill or padding of any kind that exists within the group. Similarly, arrays containing elements that are themselves contiguous are also contiguous. An example of a non-contiguous representation would be a nillable element, where a flag is used to determine whether or not the element is nil, and the location of that flag is not adjacent to the value representation.

Delimiter - A character or string used to separate, or mark the start and end of, items of data. In DFDL, dfdl:lengthKind 'delimited' searches for separators and terminators.

Delimiter scanning - When parsing, the process of scanning for a specific item in the data which marks the end of an item, or the beginning of a subsequent item is referred to as delimiter scanning, or simply scanning for short. Scanning also takes into account escape schemes so as to allow the delimiters to appear within data if properly escaped.

DFDL – Data Format Description Language

DFDL Processor - A program that uses DFDL schemas in order to process data described by them.

DFDL Schema - an XML schema containing DFDL annotations to describe data format.

Dynamic extent - This is a characteristic of the data stream. When parsing a declaration or definition, the collection of bits within the data stream that contain any aspect of the representation of that element make up the element's dynamic extent.

Dynamic scope - This is a characteristic of parts of the DFDL schema. When a definition or declaration contains or references another declaration or definition, then the contained definition or declaration is said to be in the dynamic scope of the enclosing one. The important characteristic of dynamic scoping is that it traverses references. When parsing, the dynamic scope of an element declaration includes all definitions and declarations used as part of parsing that element.

Element - A part of the data described by an element declaration in the schema and presented as an element information item in the infoset.

Explicit properties - The explicit properties are the combination of any defined locally on the annotation and any defined on the dfdl:defineFormat annotation referenced by a local dfdl:ref property.

Fixed-Occurrence Item - An array has fixed number of occurrences when xs:minOccurs = xs:maxOccurs, or when the DFDL representation properties preclude a variable number of occurrences. An optional element has a fixed number of occurrences when the DFDL representation properties preclude a variable number of occurrences.

Format Annotations - the syntactic elements by which format information is decorated onto XML schemas

Format Properties - the attributes on format annotations which specify characteristics of data format.

Framing - framing is the term we use to describe the delimiters, length fields, and other parts of the data stream which are present, and may be necessary to determine the length or position of the content of an element.

Item - A DFDL information set consists of a number of information items; or just items for short.

Length - When discussing data items and their representations, the term 'length' is used to refer to the measure of the size of the representation of an item in units of bits, bytes, or characters. The length of an array is the number of bits, bytes, or characters making up its representation, and has nothing to do with the number of occurrences, or dimensionality, of the array. Any item or array has length. Only arrays and optional elements have occurrences.

Lexical scope - In a DFDL Schema document, the lexical scope of any element is the collection of schema declarations, definitions, and annotations contained within the element textually.

Local properties – Local properties are the properties defined on an annotation in either short, attribute or element form

Logical layer - A DFDL Schema with all the DFDL annotations ignored is an ordinary XSDL schema. The logical structure described by this XSDL is called the DFDL logical layer.

Optional Element - this term refers to an element declaration or reference with xs:maxOccurs=’1’, and xs:minOccurs=’0’.

Optional Item - an item with xs:minOccurs=’0’, so that it is in fact possible for there to be no occurrences at all. Optional Elements are optional items obviously, but Variable-occurrence arrays where xs:minOccurs=0 are also optional items.

Number of Occurrences - used to discuss dimensionality of arrays and the presence/absence of optional elements.

Potentially represented - an element declaration in the schema describes a potentially represented item if that element declaration does not have an inputValueCalc property. Whether the element declaration describes an item that is actually represented or not depends on whether the element declaration is for a required or optional element, and whether the element has a corresponding value in the augmented infoset.

Physical layer – A DFDL Schema adds format annotations onto an XSDL language schema. The annotations describe the physical representation or physical layer of the data.

Point of uncertainty - A point of uncertainty occurs in the data stream when there is more than one schema component that might occur at that point.

Representation Property – The properties on a component format annotations that affect the representation of the element. These are all the properties with the exception of dfdl:ref.

Required Element - A scalar element is required. An element of a fixed-occurrence array is required. An element of a variable-occurrence array is required if its index is less than or equal to the value of xs:minOccurs. All other elements are not required.

Required property – A DFDL property that must have a value. The required properties for each xs:schema component are listed in the Property Precedence tables in section 23.

Scalar Element – Not an array and not optional. Specifically xs:maxOccurs=1 and xs:minOccurs=1. Scalar is not to be confused with 'simple'. Scalar is only about the dimensionality of the data, not its complexity/simplicity.

Scan – examine the input data bytes looking for delimiters such as separators and terminators.

Scanned length: When dfdl:lengthKind=”delimited”, or “pattern”, and additionally when dfdl:lengthKind=“endOfParent”, and the parent has scanned length (recursively).

Schema - The set of all declarations and definitions in the schema, including all included and imported schemas taken together. This includes both the XSDL declarations and definitions, and the DFDL definitions provided in the top-level DFDL annotations.

Schema Definition Order – the order that the schema components are defined in a schema document.

Specified length - An item has specified length when dfdl:lengthKind=”implicit”, “explicit”, or “prefixed”, and additionally, if dfdl:lengthKind=”endOfParent”, and the parent has specified length (recursively).

Speculative Parsing – When the parser reaches a point of uncertainty it attempts to parse each option in turn until one is known to exist or known not to exist.

Target length - When unparsing, the length (in dfdl:lengthUnits) of an item's representation is the target length. The length of the logical data item in the infoset may be shorter or longer than the target length, in which case padding or truncation may be required to make the logical data conform to the target length. Rules for when padding and truncation occur, and how they are applied is specific to simple data types, and are controlled by a number of DFDL format properties.

Unpadded length - This is the length of the representation an item of the infoset, prior to any filling or padding which might be introduced due to dfdl:lengthKind="prefixed" or dfdl:lengthKind="explicit". It is equal to or smaller than the target length.

Variable-Occurrence Item - Optional elements have a variable number of occurrences (0 or 1) and arrays also can have a variable number of occurrences (when xs:minOccurs < xs:maxOccurs). So when we say an item with a variable number of occurrences, this can mean either an optional element, or an array where xs:minOccurs < xs:maxOccurs. In either array or optional elements, we have the additional constraint that the DFDL representation properties do not preclude a variable number of occurrences.1

1 When dfdl:occursCountKind='expression' and dfdl:occursCount has a literal constant as its value, or an expression that statically evaluates to a constant, then the DFDL properties are specifying exactly the number of occurrences for all instances and so are said to preclude a variable number of occurrences. If dfdl:occursCount has a formula as its expressed value, then the DFDL properties do not preclude a variable number of occurrences.

Copyright (C) Open Grid Forum (2005-2010). All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the OGF or other organizations, except as needed for the purpose of developing Grid Recommendations in which case the procedures for copyrights defined in the OGF Document process must be followed, or as required to translate it into languages other than English.