Data Format Description Language (DFDL) v1.0 Specification
OGF Proposed Recommendation GFD-P-R.174, January 31, 2011
Property Name |
Description |
---|---|
textNumberRep |
Enum Valid values are ‘standard', ‘zoned’, ‘standard' means represented as characters in the ‘encoding’ code page ‘zoned’ means represented as a zoned decimal in the ‘encoding’ code page. Zoned is not supported for float and double numbers Annotation: dfdl:element, dfdl:simpleType |
textNumberJustification |
Enum Valid values ‘left’, ‘right', ‘center’ Controls how the data is padded or trimmed on parsing and unparsing. Behavior as for dfdl:textStringJustification. Annotation: dfdl:element, dfdl:simpleType |
textNumberPadCharacter |
DFDL String Literal The value that is used when padding or trimming number elements. The value can be a single character or a single byte.If a character, then it can be specified using a literal character or using DFDL entities. If a byte, then it must be specified using a single byte value entityIf a pad character is specified when dfdl:lengthUnits='bytes' then the pad character must be a single-byte character. If a pad byte is specified when dfdl:lengthUnits='characters' then - the encoding must be a fixed-width encoding - padding and trimming must be applied using a sequence of N pad bytes, where N is the width of a character in the fixed-width encoding. Annotation: dfdl:element, dfdl:simpleType |
textNumberPattern |
String Defines the ICU-like pattern that describes the format of the text number. The pattern defines where grouping separators, decimal separators, implied decimal points, exponents, positive signs and negative signs appear. It permits definition by either digits/fractions or significant digits. Allows rounding. When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10. When dfdl:textNumberRep is 'standard' and dfdl:textStandardBase is not 10 the number is represented as the minimum number of characters to represent the digits. There is no sign or virtual decimal point. The syntax of dfdl:textNumberPattern is described in section 13.6.1 The textNumberPattern Property Annotation: dfdl:element, dfdl:simpleType |
textNumberRounding |
Enum Specifies how rounding is controlled during unparsing. Valid values ‘pattern' 'explicit' When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10. If 'pattern' then the rounding increment is specified in the dfdl:textNumberPattern using digits '1' though '9'. The rounding mode is always 'roundHalfEven'. To switch off rounding, do not use digits '1' through '9'. If 'explicit' then the rounding increment is specified by the dfdl:textNumberRoundingIncrement property, and any digits '1' through '9' in the dfdl:textNumberPattern are treated as digit '0'. The rounding mode is specified by the dfdl:textRoundingMode property. To switch off rounding, use 0.0 for the dfdl:textNumberRoundingIncrement. Annotation: dfdl:element, dfdl:simpleType |
textNumberRoundingMode |
Enum Specifies how rounding occurs during unparsing, when dfdl:textNumberRounding is 'explicit'. When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10. Valid values ‘roundCeiling’, ‘roundFloor’, ‘roundDown’, ‘roundUp’, ‘roundHalfEven’, ‘roundHalfDown’, ‘roundHalfUp' Annotation: dfdl:element, dfdl:simpleType |
textNumberRoundingIncrement |
Double Specifies the rounding increment to use during unparsing, when dfdl:textNumberRounding is 'explicit'. When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10. To switch off rounding, use 0.0. A negative value is a schema definition error. Annotation: dfdl:element, dfdl:simpleType |
textNumberCheckPolicy |
Enum Values are 'strict' and 'lax'. Indicates how lenient to be when parsing against the pattern. When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10. If ‘lax' and dfdl:textNumberRep is 'standard' then grouping separators can be omitted, decimal separator can be either ‘.’ or ‘,’ (as long as this is unambiguous), leading positive sign can be omitted, all whitespace is treated as zero, and leading and trailing whitespace is ignored. Also the exponent is also optional and assumed to be '1' if not supplied If 'lax' and dfdl:textNumberRep is 'zoned' then positive punched data is accepted when parsing an unsigned type, and unpunched data is accepted when parsing a signed type On unparsing the pattern is always followed and follow the rules in 13.6.2 Converting logical numbers to/from text representation. Annotation: dfdl:element, dfdl:simpleType |
textStandardDecimalSeparator |
DFDL String Literal or DFDL Expression Defines the single character that will appear in the data as the decimal separator. This property is applicable when dfdl:textNumberRep is 'standard' and dfdl:textStandardBase is 10. This property can be computed by way of an expression which returns a character. The expression must not contain forward references to elements which have not yet been processed. Annotation: dfdl:element, dfdl:simpleType |
textStandardGroupingSeparator |
DFDL String Literal or DFDL Expression Defines the single character that will appear in the data as the grouping separator. This property is applicable when dfdl:textNumberRep is 'standard' and dfdl:textStandardBase is 10. This property can be computed by way of an expression which returns a character. The expression must not contain forward references to elements which have not yet been processed. Annotation: dfdl:element, dfdl:simpleType |
textStandardExponentCharacter |
DFDL String Literal or DFDL Expression Defines the actual character that will appear in the data as the exponent indicator. If the empty string is specified then no exponent character will be used. This property is applicable when dfdl:textNumberRep is 'standard' and dfdl:textStandardBase is 10. This property can be computed by way of an expression which returns a character. The expression must not contain forward references to elements which have not yet been processed. If dfdl:ignoreCase is 'yes' then the case of the string is ignored by the parser. Annotation: dfdl:element, dfdl:simpleType |
textStandardInfinityRep |
DFDL String Literal The value used to represent infinity. Infinity is represented as a string with the positive or negative prefixes and suffixes from the dfdl:textNumberPattern applied This property is applicable when dfdl:textNumberRep is 'standard', dfdl:textStandardBase is 10 and the simple type is float or double If dfdl:ignoreCase is 'yes' then the case of the string is ignored by the parser. Annotation: dfdl:element, dfdl:simpleType |
textStandardNanRep |
DFDL String Literal The value used to represent NaN. NaN is represented as string and the positive or negative prefixes and suffixes from the dfdl:textNumberPattern are not used This property is applicable when dfdl:textNumberRep is 'standard', dfdl:textStandardBase is 10 and the simple type is float or double If dfdl:ignoreCase is 'yes' then the case of the string is ignored by the parser. Annotation: dfdl:element, dfdl:simpleType |
textStandardZeroRep |
List of DFDL String Literals Valid values: empty string, any character string The whitespace separated list of alternative literal strings that are equivalent to zero, for example the characters ‘zero’. On unparsing the first value is used. If dfdl:ignoreCase is 'yes' then the case of the string is ignored by the parser. The empty string means that there is no special literal string for zero This property is applicable when dfdl:textNumberRep is 'standard' and dfdl:textStandardBase is 10. Annotation: dfdl:element, dfdl:simpleType |
textStandardBase |
Non-negative Integer Valid Values 2, 8, 10, 16 Indicates the number base. Only used when dfdl:textNumberRep is 'standard'. When base is not 10, xs:decimal, xs:float and xs:double are not supported. When dfdl:textNumberRep is 'zoned' dfdl:textNumberBase 10 is assumed Annotation: dfdl:element, dfdl:simpleType |
textZonedSignStyle |
Enum Specifies the characters that are used to overpunch the sign nibble when the encoding is an ASCII character set. The location of this sign nibble is indicated in the dfdl:textNumberPattern. This property is applicable when dfdl:textNumberRep is 'zoned' Used only when encoding specifies an ASCII-derived character set. That is printable character codepoints 0x20 - 0x7E are the same as US-ASCII. This includes all the Unicode character sets, and all variations of ASCII and ISO-8859. Valid values 'asciiStandard', ‘asciiTranslatedEBCDIC', ‘asciiCARealiaModified' Which characters are used to represent ‘overpunched’ (included) positive and negative signs, varies by encoding, Cobol compiler and system. It is fixed for EBCDIC systems but not for ASCII. In EBCDIC-based encodings, characters ‘{ABCDEFGHI’ or '0123456789' represent a positive sign and digits 0 to 9. (Character codes 0xC0 to 0xC9 or 0xF0 to 0xF9). The characters ‘JKLMNOPQR’ or '^£¥·©§¶¼½¾ 'represent a negative sign and digits 0 to 9. (character codes 0xD0 to 0xD9 or 0xB0 to 0xB9) On parsing both ranges of characters will be accepted. On unparsing the range 0xC0 to 0xC9 will be produced for positive signs and 0xD0 to 0xD9 will be produced for negative signs. asciiStandard: ASCII characters ‘0123456789’ represent a positive sign and the corresponding digit. (Sign nibble for ‘+’ is 0x3, which is the high nibble of these character codes unmodified.) ASCII characters ‘pqrstuvwxy’ represent negative sign and digits 0 to 9. (Character codes 0x70 to 0x79) asciiTranslatedEBCDIC: The overpunched character is the ASCII equivalent of the EBCDIC above. So the characters ‘{ABCDEFGHI’ still represent a positive sign and digits 0 to 9. (These are character codes 0x7B, 0x41 through 0x49). The characters ‘JKLMNOPQR’ still represent negative sign and digits 0 to 9. (These are character codes 0x7D, 0x4A through 0x52). This case comes up if ebcdic zoned decimal data is translated to ASCII as if it were textual data. asciiCARealiaModified 1. In this style, the ASCII characters ‘0123456789’ represent positive sign and digits 0 to 9 as in standard. However, ASCII characters from code 0x20 to 0x29 are used for negative sign and the corresponding decimal digit. This doesn't translate well into printing characters. These characters include the space (‘ ‘) for zero, characters ‘!”#$%&’ for 1 through 6, the single quote character “’” for 7, and the parenthesis ‘()’ for 8 and 9. Annotation: dfdl:element, dfdl:simpleType |
The dfdl:textNumberPattern describes how to parse and unparse text representations of number logical types with base 10.
The length of the representation of the number is determined first, and the number pattern is used only for conversion of the text to and from a numeric logical infoset value.
The pattern described below is derived from the ICU DecimalFormat class described here: [ICUDecForm]
The pattern is an ICU-like syntax that defines where grouping separators, decimal separators, implied decimal points, exponents, positive signs and negative signs appear. It permits definition by either digits/fractions or significant digits.
If the pattern uses digits/fractions then these must match any XML schema facets. If not it is a schema definition error.
13.6.1.1 dfdl:textNumberPattern for dfdl:textNumberRep 'standard'
When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10
The pattern comes in two parts separated by a semi-colon. The first is mandatory and applies to positive numbers, the second is optional and applies to negative numbers.
Examples: The first shows digits/fractions and positive/negative signs, the second shows exponent, the third shows significant digits.
+###,##0.00;(###,##0.00)
##0.0#E0
+###V#0
Note that 'V' is used to indicate the location of an implied decimal point for fixed point number representations. (This is an extension to the ICU pattern language.)
The actual grouping separator, decimal separator and exponent characters are defined independently of the pattern.
The actual positive sign and negative sign are defined within the pattern itself.
Many characters in a pattern are taken literally; they are matched during parsing and output unchanged during formatting. Special characters, on the other hand, stand for other characters, strings, or classes of characters. For example, the '#' character is replaced by a digit.
Symbol |
Location |
Meaning |
---|---|---|
0 |
Number |
Digit |
1-9 |
Number |
'1' through '9' indicates rounding. |
# |
Number |
Digit, zero shows as absent |
. |
Number |
Decimal separator or monetary decimal separator |
- |
Number |
Minus sign |
, |
Number |
Grouping separator |
E |
Number |
Separates mantissa and exponent in scientific notation. Need not be quoted in prefix or suffix. |
+ |
Exponent |
Prefix positive exponents with plus sign. Need not be quoted in prefix or suffix. |
; |
Subpattern boundary |
Separates positive and negative subpatterns |
' |
Prefix or suffix |
Used to quote special characters in a prefix or suffix, for example, "'#'#" formats 123 to "#123". To create a single quote itself, use two in a row: "# o''clock". |
* |
Prefix or suffix boundary |
Pad escape, precedes pad character |
V |
Number |
Virtual decimal point marker. Only used with decimal, float and double simple types. |
P |
Number |
Decimal scaling position. Only used with decimal, float and double simple types. |
A pattern contains a positive and negative subpattern, for example, "#,##0.00;(#,##0.00)". Each subpattern has a prefix, a numeric part, and a suffix. If there is no explicit negative subpattern, the negative subpattern is the minus sign prefixed to the positive subpattern. That is, "0.00" alone is equivalent to "0.00;-0.00". If there is an explicit negative subpattern, it serves only to specify the negative prefix and suffix; the number of digits, minimal digits, and other characteristics are ignored in the negative subpattern. That means that "#,##0.0#;(#)" has precisely the same result as "#,##0.0#;(#,##0.0#)".
The prefixes, suffixes, and various symbols used for infinity, digits, grouping separators, decimal separators, etc. may be set to arbitrary values, and they will appear properly during formatting. However, care must be taken that the symbols and strings do not conflict, or parsing will be unreliable. For example, either the positive and negative prefixes or the suffixes must be distinct for parse to be able to distinguish positive from negative values. Another example is that the decimal separator and grouping separator should be distinct characters, or parsing will be impossible.
The grouping separator is a character that separates clusters of integer digits to make large numbers more legible. It commonly used for thousands, but in some locales it separates ten-thousands. The grouping size is the number of digits between the grouping separators, such as 3 for "100,000,000" or 4 for "1 0000 0000". There are actually two different grouping sizes: One used for the least significant integer digits, the primary grouping size, and one used for all others, the secondary grouping size. In most locales these are the same, but sometimes they are different. For example, if the primary grouping interval is 3, and the secondary is 2, then this corresponds to the pattern "#,##,##0", and the number 123456789 is formatted as "12,34,56,789". If a pattern contains multiple grouping separators, the interval between the last one and the end of the integer defines the primary grouping size, and the interval between the last two defines the secondary grouping size. All others are ignored, so "#,##,###,####" == "###,###,####" == "##,#,###,####".
The P symbol is used to specify the location of an assumed decimal point when the point is not within the number that appears in the data.
The symbol P can be specified only as a continuous string of Ps in the leftmost or rightmost digit positions in the number region of the pattern. The decimal point symbol V is assumed as either the leftmost or rightmost character of the number region.
It is a schema definition error if any symbols other than "0", "1"-"9" or # are used in the same number region of the pattern as V or P.
Data representation |
Pattern |
Value |
---|---|---|
123 |
PP000 |
0.00123 |
123 |
000PP |
12300 |
pattern := subpattern (';' subpattern)?
subpattern := prefix? ((number exponent?)|(vpinteger) suffix?
number := (integer ('.' fraction)?)
vpinteger := (pinteger | vinteger)
pinteger := ('P'* integer) | (integer 'P'* )
vinteger := ('V'? integer) |
('#'* 'V'? integer)|
('#'* '0'* 'V'? '0'* '0')|
(integer 'V'?)
prefix := '\u0000'..'\uFFFD' - specialCharacters
suffix := '\u0000'..'\uFFFD' - specialCharacters
integer := '#'* '0'* '0'
fraction := '0'* '#'*
exponent := 'E' '+'? '0'* '0'
padSpec := '*' padChar
padChar := '\u0000'..'\uFFFD' - quote
Notation:
X* 0 or more instances of X
X? 0 or 1 instances of X
X|Y either X or Y
C..D any character from C up to D, inclusive
S-T characters in S, except those in T
dfdl:numberPattern
syntaxThe first subpattern is for positive numbers. The second (optional) subpattern is for negative numbers.
Not indicated in the BNF syntax above:
The grouping separator ',' can occur inside the integer elements, between any two pattern characters of that element, as long as the integer or sigDigits element is not followed by the exponent element.
Two grouping intervals are recognized: That between the decimal point and the first grouping symbol, and that between the first and second grouping symbols. These intervals are identical in most locales, but in some locales they differ. For example, the pattern "#,##,###" formats the number 123456789 as "12,34,56,789".
The pad specifier padSpec may appear before the prefix, after the prefix, before the suffix, after the suffix, or not at all.
In place of '0', the digits '1' through '9' may be used to indicate a rounding increment.
Parsing
During parsing, grouping separators are removed from the data.
Formatting
Formatting is guided by several parameters all of which can be specified using a pattern. The following description applies to formats that do not use scientific notation.
If the number of actual integer digits exceeds the maximum integer digits, then only the least significant digits are shown. For example, 1997 is formatted as "97" if the maximum integer digits is set to 2.
If the number of actual integer digits is less than the minimum integer digits, then leading zeros are added. For example, 1997 is formatted as "01997" if the minimum integer digits is set to 5.
If the number of actual fraction digits exceeds the maximum fraction digits, then half-even rounding it performed to the maximum fraction digits. For example, 0.125 is formatted as "0.12" if the maximum fraction digits is 2. This behavior can be changed by specifying a rounding increment and a rounding mode.
If the number of actual fraction digits is less than the minimum fraction digits, then trailing zeros are added. For example, 0.125 is formatted as "0.1250" if the minimum fraction digits is set to 4.
Trailing fractional zeros are not displayed if they occur j positions after the decimal, where j is less than the maximum fraction digits. For example, 0.10004 is formatted as "0.1" if the maximum fraction digits is four or less.
Special Values
NaN is represented as a string determined by the dfdl:textStandardNanRep property. This is the only value for which the prefixes and suffixes are not used.
Infinity is represented as a string with the positive or negative prefixes and suffixes applied. The infinity string is determined by the dfdl:textStandardInfinityRep property.
Scientific Notation
Numbers in scientific notation are expressed as the product of a mantissa and a power of ten, for example, 1234 can be expressed as 1.234 x 103. The mantissa is typically in the half-open interval [1.0, 10.0) or sometimes [0.0, 1.0), but it need not be. In a pattern, the exponent character immediately followed by one or more digit characters indicates scientific notation. Example: "0.###E0" formats the number 1234 as "1.234E3".
The number of digit characters after the exponent character gives the minimum exponent digit count. There is no maximum. Negative exponents are formatted using the minus sign, not the prefix and suffix from the pattern. This allows patterns such as "0.###E0 m/s". To prefix positive exponents with a plus sign, specify '+' between the exponent and the digits: "0.###E+0" will produce formats "1E+1", "1E+0", "1E-1", etc.
The minimum number of integer digits is achieved by adjusting the exponent. Example: 0.00123 formatted with "00.###E0" yields "12.3E-4". This only happens if there is no maximum number of integer digits. If there is a maximum, then the minimum number of integer digits is fixed at one.
The maximum number of integer digits, if present, specifies the exponent grouping. The most common use of this is to generate engineering notation, in which the exponent is a multiple of three, e.g., "##0.###E0". The number 12345 is formatted using "##0.####E0" as "12.345E3".
When using scientific notation, the formatter controls the digit counts using significant digits logic. The maximum number of significant digits limits the total number of integer and fraction digits that will be shown in the mantissa; it does not affect parsing. For example, 12345 formatted with "##0.##E0" is "12.3E3". .
Exponential patterns may not contain grouping separators.
Padding
Padding may be specified through the pattern syntax. In a pattern the pad escape character, followed by a single pad character, causes padding to be parsed and formatted. The pad escape character is '*'. For example, "*x#,##0.00" formats 123 to "xx123.00", and 1234 to "1,234.00".
When padding is in effect, the width of the positive subpattern, including prefix and suffix, determines the format width. For example, in the pattern "* #0 o''clock", the format width is 10.
The width is counted in 16-bit code units.
Some parameters which usually do not matter have meaning when padding is used, because the pattern width is significant with padding. In the pattern "* ##,##,#,##0.##", the format width is 14. The initial characters "##,##," do not affect the grouping size or maximum integer digits, but they do affect the format width.
Padding may be inserted at one of four locations: before the prefix, after the prefix, before the suffix, or after the suffix. If there is no prefix, before the prefix and after the prefix are equivalent, likewise for the suffix.
When specified in a pattern, the 32-bit code point immediately following the pad escape is the pad character. This may be any character, including a special pattern character. That is, the pad escape escapes the following character. If there is no character after the pad escape, then the pattern is illegal.
Note: This padding is in addition to the normal DFDL text padding.
Rounding
How rounding is controlled is given by dfdl:textNumberRounding. The rounding increment may be specified in the dfdl:textNumberPattern itself using digits '1' through '9' or using an explicit increment in dfdl:textNumberRoundingIncrement. For example, 1230 rounded to the nearest 50 is 1250. 1.234 rounded to the nearest 0.65 is 1.3.
Rounding only affects the string produced by formatting. It does not affect parsing or change any numerical values.
In a pattern, digits '1' through '9' specify rounding, but otherwise behave identically to digit '0'. For example, "#,#50" specifies a rounding increment of 50.
Using digits in a pattern, rounding is always 'half even', meaning rounds towards the nearest integer, or towards the nearest even integer if equidistant.
Using an explicit rounding increment, dfdl:textNumberRoundingMode determines how values are rounded.
13.6.1.2 dfdl:textNumberPattern for dfdl:textNumberRep 'zoned'
When dfdl:textNumberRep is ‘zoned’ only the pattern for positive numbers is used. It is a schema definition error if the negative pattern is specified.
Only the following pattern characters may be used:
'+' MUST BE present at the beginning or end of the pattern to indicate whether the leading or trailing digit carries the overpunched sign, if the logical type is signed
'+' MAY BE present at the beginning or end of the pattern to indicate whether the leading or trailing digit carries the overpunched sign, if the logical type is unsigned. If logical type is unsigned and dfdl:textNumberPolicy = 'lax' specified it is a schema definition error if no '+' is present.
'V' MAY BE used to indicate the location of an implied decimal point
'P' MAY BE used to indicate the decimal scaling
'0-9' indicates the number of required digits (including overpunched).
'#' indicates the number optional digits.
Rounding occurs as described under Rounding in 13.6.1.1 dfdl:textNumberPattern for dfdl:textNumberRep 'standard'.
Signed numbers with dfdl:textNumberRep 'standard' and dfdl:textStandardBase 10 are mapped using the dfdl:textNumberPattern.
Signed numbers with dfdl:textNumberRep 'standard' and dfdl:textStandardBase not 10 are mapped to an unsigned representation. On unparsing the minimum number of characters to represent the digits is output and it is a processing error if the value is negative.
Signed numbers with dfdl:textNumberRep 'zoned' are mapped using the dfdl:textNumberPattern to indicate the position of the sign and virtual decimal point. On parsing if the sign is not overpunched, that is it does not have a sign, it is treated as positive. On unparsing the sign is always overpunched.
Unsigned numbers with dfdl:textNumberRep 'standard' and dfdl:textStandardBase 10 are mapped using the dfdl:textNumberPattern. On parsing it is a processing error if the data are negative.
Unsigned numbers with dfdl:textNumberRep 'standard' and dfdl:textStandardBase not 10 are mapped to an unsigned representation. On unparsing the minimum number of characters to represent the digits is output. .
Unsigned numbers with dfdl:textNumberRep 'zoned' are mapped using the dfdl:textNumberPattern to indicate the position of the sign and virtual decimal point. On parsing it is a processing error if the data are negative. On unparsing the data are not overpunched with a sign.
COBOL compilers that run on ASCII platforms have a "signed" data type that operates in a similar manner to the EBCDIC Signed field -- that is, they over punch the sign on the LSD. However, this is not standardized in ASCII, and different compilers use different overpunch codes.For example, Computer Associates' Realia compiler uses a 30 hex for positive values and a 20 hex for negative values, but Micro Focus® and Microsoft® use 30 hex for positive values and 70 hex for negative values.
Copyright (C) Open Grid Forum (2005-2010). All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the OGF or other organizations, except as needed for the purpose of developing Grid Recommendations in which case the procedures for copyrights defined in the OGF Document process must be followed, or as required to translate it into languages other than English.