Data Format Description Language (DFDL) v1.0 Specification
OGF Proposed Recommendation GFD-P-R.174, January 31, 2011


13.6 Properties Specific to Number with Text representation

Property Name

Description

textNumberRep

Enum

Valid values are ‘standard', ‘zoned’,

‘standard' means represented as characters in the ‘encoding’ code page

‘zoned’ means represented as a zoned decimal in the ‘encoding’ code page. Zoned is not supported for float and double numbers

Annotation: dfdl:element, dfdl:simpleType

textNumberJustification

Enum

Valid values ‘left’, ‘right', ‘center’

Controls how the data is padded or trimmed on parsing and unparsing.

Behavior as for dfdl:textStringJustification.

Annotation: dfdl:element, dfdl:simpleType

textNumberPadCharacter

DFDL String Literal

The value that is used when padding or trimming number elements. The value can be a single character or a single byte.If a character, then it can be specified using a literal character or using DFDL entities. If a byte, then it must be specified using a single byte value entityIf a pad character is specified when dfdl:lengthUnits='bytes' then the pad character must be a single-byte character. If a pad byte is specified when dfdl:lengthUnits='characters' then - the encoding must be a fixed-width encoding - padding and trimming must be applied using a sequence of N pad bytes, where N is the width of a character in the fixed-width encoding.

Annotation: dfdl:element, dfdl:simpleType

textNumberPattern

String

Defines the ICU-like pattern that describes the format of the text number. The pattern defines where grouping separators, decimal separators, implied decimal points, exponents, positive signs and negative signs appear. It permits definition by either digits/fractions or significant digits. Allows rounding.

When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10. When dfdl:textNumberRep is 'standard' and dfdl:textStandardBase is not 10 the number is represented as the minimum number of characters to represent the digits. There is no sign or virtual decimal point.

The syntax of dfdl:textNumberPattern is described in section ‎13.6.1 The textNumberPattern Property

Annotation: dfdl:element, dfdl:simpleType

textNumberRounding

Enum

Specifies how rounding is controlled during unparsing.

Valid values ‘pattern' 'explicit'

When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10.

If 'pattern' then the rounding increment is specified in the dfdl:textNumberPattern using digits '1' though '9'. The rounding mode is always 'roundHalfEven'. To switch off rounding, do not use digits '1' through '9'.

If 'explicit' then the rounding increment is specified by the dfdl:textNumberRoundingIncrement property, and any digits '1' through '9' in the dfdl:textNumberPattern are treated as digit '0'. The rounding mode is specified by the dfdl:textRoundingMode property. To switch off rounding, use 0.0 for the dfdl:textNumberRoundingIncrement.

Annotation: dfdl:element, dfdl:simpleType

textNumberRoundingMode

Enum

Specifies how rounding occurs during unparsing, when dfdl:textNumberRounding is 'explicit'.

When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10.

Valid values ‘roundCeiling’, ‘roundFloor’, ‘roundDown’, ‘roundUp’, ‘roundHalfEven’, ‘roundHalfDown’, ‘roundHalfUp'

Annotation: dfdl:element, dfdl:simpleType

textNumberRoundingIncrement

Double

Specifies the rounding increment to use during unparsing, when dfdl:textNumberRounding is 'explicit'.

When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10.

To switch off rounding, use 0.0.

A negative value is a schema definition error.

Annotation: dfdl:element, dfdl:simpleType

textNumberCheckPolicy

Enum

Values are 'strict' and 'lax'.

Indicates how lenient to be when parsing against the pattern.

When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10.

If ‘lax' and dfdl:textNumberRep is 'standard' then grouping separators can be omitted, decimal separator can be either ‘.’ or ‘,’ (as long as this is unambiguous), leading positive sign can be omitted, all whitespace is treated as zero, and leading and trailing whitespace is ignored. Also the exponent is also optional and assumed to be '1' if not supplied

If 'lax' and dfdl:textNumberRep is 'zoned' then positive punched data is accepted when parsing an unsigned type, and unpunched data is accepted when parsing a signed type

On unparsing the pattern is always followed and follow the rules in ‎13.6.2 Converting logical numbers to/from text representation.

Annotation: dfdl:element, dfdl:simpleType

textStandardDecimalSeparator

DFDL String Literal or DFDL Expression

Defines the single character that will appear in the data as the decimal separator.

This property is applicable when dfdl:textNumberRep is 'standard' and dfdl:textStandardBase is 10.

This property can be computed by way of an expression which returns a character. The expression must not contain forward references to elements which have not yet been processed.

Annotation: dfdl:element, dfdl:simpleType

textStandardGroupingSeparator

DFDL String Literal or DFDL Expression

Defines the single character that will appear in the data as the grouping separator.

This property is applicable when dfdl:textNumberRep is 'standard' and dfdl:textStandardBase is 10.

This property can be computed by way of an expression which returns a character. The expression must not contain forward references to elements which have not yet been processed.

Annotation: dfdl:element, dfdl:simpleType

textStandardExponentCharacter

DFDL String Literal or DFDL Expression

Defines the actual character that will appear in the data as the exponent indicator. If the empty string is specified then no exponent character will be used.

This property is applicable when dfdl:textNumberRep is 'standard' and dfdl:textStandardBase is 10.

This property can be computed by way of an expression which returns a character. The expression must not contain forward references to elements which have not yet been processed.

If dfdl:ignoreCase is 'yes' then the case of the string is ignored by the parser.

Annotation: dfdl:element, dfdl:simpleType

textStandardInfinityRep

DFDL String Literal

The value used to represent infinity.

Infinity is represented as a string with the positive or negative prefixes and suffixes from the dfdl:textNumberPattern applied

This property is applicable when dfdl:textNumberRep is 'standard', dfdl:textStandardBase is 10 and the simple type is float or double

If dfdl:ignoreCase is 'yes' then the case of the string is ignored by the parser.

Annotation: dfdl:element, dfdl:simpleType

textStandardNanRep

DFDL String Literal

The value used to represent NaN.

NaN is represented as string and the positive or negative prefixes and suffixes from the dfdl:textNumberPattern are not used

This property is applicable when dfdl:textNumberRep is 'standard', dfdl:textStandardBase is 10 and the simple type is float or double

If dfdl:ignoreCase is 'yes' then the case of the string is ignored by the parser.

Annotation: dfdl:element, dfdl:simpleType

textStandardZeroRep

List of DFDL String Literals

Valid values: empty string, any character string

The whitespace separated list of alternative literal strings that are equivalent to zero, for example the characters ‘zero’.

On unparsing the first value is used.

If dfdl:ignoreCase is 'yes' then the case of the string is ignored by the parser.

The empty string means that there is no special literal string for zero

This property is applicable when dfdl:textNumberRep is 'standard' and dfdl:textStandardBase is 10.

Annotation: dfdl:element, dfdl:simpleType

textStandardBase

Non-negative Integer

Valid Values 2, 8, 10, 16

Indicates the number base.

Only used when dfdl:textNumberRep is 'standard'.

When base is not 10, xs:decimal, xs:float and xs:double are not supported.

When dfdl:textNumberRep is 'zoned' dfdl:textNumberBase 10 is assumed

Annotation: dfdl:element, dfdl:simpleType

textZonedSignStyle

Enum

Specifies the characters that are used to overpunch the sign nibble when the encoding is an ASCII character set. The location of this sign nibble is indicated in the dfdl:textNumberPattern.

This property is applicable when dfdl:textNumberRep is 'zoned'

Used only when encoding specifies an ASCII-derived character set. That is printable character codepoints 0x20 - 0x7E are the same as US-ASCII. This includes all the Unicode character sets, and all variations of ASCII and ISO-8859.

Valid values 'asciiStandard', ‘asciiTranslatedEBCDIC', ‘asciiCARealiaModified'

Which characters are used to represent ‘overpunched’ (included) positive and negative signs, varies by encoding, Cobol compiler and system. It is fixed for EBCDIC systems but not for ASCII.

In EBCDIC-based encodings, characters ‘{ABCDEFGHI’ or '0123456789' represent a positive sign and digits 0 to 9. (Character codes 0xC0 to 0xC9 or 0xF0 to 0xF9). The characters ‘JKLMNOPQR’ or '^£¥·©§¶¼½¾ 'represent a negative sign and digits 0 to 9. (character codes 0xD0 to 0xD9 or 0xB0 to 0xB9) On parsing both ranges of characters will be accepted. On unparsing the range 0xC0 to 0xC9 will be produced for positive signs and 0xD0 to 0xD9 will be produced for negative signs.

asciiStandard: ASCII characters ‘0123456789’ represent a positive sign and the corresponding digit. (Sign nibble for ‘+’ is 0x3, which is the high nibble of these character codes unmodified.) ASCII characters ‘pqrstuvwxy’ represent negative sign and digits 0 to 9. (Character codes 0x70 to 0x79)

asciiTranslatedEBCDIC: The overpunched character is the ASCII equivalent of the EBCDIC above. So the characters ‘{ABCDEFGHI’ still represent a positive sign and digits 0 to 9. (These are character codes 0x7B, 0x41 through 0x49). The characters ‘JKLMNOPQR’ still represent negative sign and digits 0 to 9. (These are character codes 0x7D, 0x4A through 0x52). This case comes up if ebcdic zoned decimal data is translated to ASCII as if it were textual data.

asciiCARealiaModified 1. In this style, the ASCII characters ‘0123456789’ represent positive sign and digits 0 to 9 as in standard. However, ASCII characters from code 0x20 to 0x29 are used for negative sign and the corresponding decimal digit. This doesn't translate well into printing characters. These characters include the space (‘ ‘) for zero, characters ‘!”#$%&’ for 1 through 6, the single quote character “’” for 7, and the parenthesis ‘()’ for 8 and 9.

Annotation: dfdl:element, dfdl:simpleType

13.6.1 The textNumberPattern Property

The dfdl:textNumberPattern describes how to parse and unparse text representations of number logical types with base 10.

The length of the representation of the number is determined first, and the number pattern is used only for conversion of the text to and from a numeric logical infoset value.

The pattern described below is derived from the ICU DecimalFormat class described here: [ICUDecForm]

The pattern is an ICU-like syntax that defines where grouping separators, decimal separators, implied decimal points, exponents, positive signs and negative signs appear. It permits definition by either digits/fractions or significant digits.

If the pattern uses digits/fractions then these must match any XML schema facets. If not it is a schema definition error.

13.6.1.1 dfdl:textNumberPattern for dfdl:textNumberRep 'standard'

When dfdl:textNumberRep is 'standard' this property only applies when dfdl:textStandardBase is 10

The pattern comes in two parts separated by a semi-colon. The first is mandatory and applies to positive numbers, the second is optional and applies to negative numbers.

Examples: The first shows digits/fractions and positive/negative signs, the second shows exponent, the third shows significant digits.

+###,##0.00;(###,##0.00)

##0.0#E0

+###V#0

Note that 'V' is used to indicate the location of an implied decimal point for fixed point number representations. (This is an extension to the ICU pattern language.)

The actual grouping separator, decimal separator and exponent characters are defined independently of the pattern.

The actual positive sign and negative sign are defined within the pattern itself.

Many characters in a pattern are taken literally; they are matched during parsing and output unchanged during formatting. Special characters, on the other hand, stand for other characters, strings, or classes of characters. For example, the '#' character is replaced by a digit.

To insert a special character in a pattern as a literal, that is, without any special meaning, the character must be quoted. There are some exceptions to this which are noted below.

Symbol

Location

Meaning

0

Number

Digit

1-9

Number

'1' through '9' indicates rounding.

#

Number

Digit, zero shows as absent

.

Number

Decimal separator or monetary decimal separator

-

Number

Minus sign

,

Number

Grouping separator

E

Number

Separates mantissa and exponent in scientific notation. Need not be quoted in prefix or suffix.

+

Exponent

Prefix positive exponents with plus sign. Need not be quoted in prefix or suffix.

;

Subpattern boundary

Separates positive and negative subpatterns

'

Prefix or suffix

Used to quote special characters in a prefix or suffix, for example, "'#'#" formats 123 to "#123". To create a single quote itself, use two in a row: "# o''clock".

*

Prefix or suffix boundary

Pad escape, precedes pad character

V

Number

Virtual decimal point marker. Only used with decimal, float and double simple types.

P

Number

Decimal scaling position. Only used with decimal, float and double simple types.

Table 20 dfdl:textNumberPattern special characters

 

A pattern contains a positive and negative subpattern, for example, "#,##0.00;(#,##0.00)". Each subpattern has a prefix, a numeric part, and a suffix. If there is no explicit negative subpattern, the negative subpattern is the minus sign prefixed to the positive subpattern. That is, "0.00" alone is equivalent to "0.00;-0.00". If there is an explicit negative subpattern, it serves only to specify the negative prefix and suffix; the number of digits, minimal digits, and other characteristics are ignored in the negative subpattern. That means that "#,##0.0#;(#)" has precisely the same result as "#,##0.0#;(#,##0.0#)".

The prefixes, suffixes, and various symbols used for infinity, digits, grouping separators, decimal separators, etc. may be set to arbitrary values, and they will appear properly during formatting. However, care must be taken that the symbols and strings do not conflict, or parsing will be unreliable. For example, either the positive and negative prefixes or the suffixes must be distinct for parse to be able to distinguish positive from negative values. Another example is that the decimal separator and grouping separator should be distinct characters, or parsing will be impossible.

The grouping separator is a character that separates clusters of integer digits to make large numbers more legible. It commonly used for thousands, but in some locales it separates ten-thousands. The grouping size is the number of digits between the grouping separators, such as 3 for "100,000,000" or 4 for "1 0000 0000". There are actually two different grouping sizes: One used for the least significant integer digits, the primary grouping size, and one used for all others, the secondary grouping size. In most locales these are the same, but sometimes they are different. For example, if the primary grouping interval is 3, and the secondary is 2, then this corresponds to the pattern "#,##,##0", and the number 123456789 is formatted as "12,34,56,789". If a pattern contains multiple grouping separators, the interval between the last one and the end of the integer defines the primary grouping size, and the interval between the last two defines the secondary grouping size. All others are ignored, so "#,##,###,####" == "###,###,####" == "##,#,###,####".

The P symbol is used to specify the location of an assumed decimal point when the point is not within the number that appears in the data.

The symbol P can be specified only as a continuous string of Ps in the leftmost or rightmost digit positions in the number region of the pattern. The decimal point symbol V is assumed as either the leftmost or rightmost character of the number region.

It is a schema definition error if any symbols other than "0", "1"-"9" or # are used in the same number region of the pattern as V or P.

Examples

Data representation

Pattern

Value

123

PP000

0.00123

123

000PP

12300

Pattern BNF
 pattern    := subpattern (';' subpattern)?
 subpattern := prefix? ((number exponent?)|(vpinteger) suffix?
 number     := (integer ('.' fraction)?) 

 vpinteger  := (pinteger | vinteger)
 pinteger   := ('P'* integer) | (integer 'P'* )  
 vinteger   := ('V'? integer) |
               ('#'* 'V'? integer)|
               ('#'* '0'* 'V'? '0'* '0')|
               (integer 'V'?) 

 prefix     := '\u0000'..'\uFFFD' - specialCharacters
 suffix     := '\u0000'..'\uFFFD' - specialCharacters
 integer    := '#'* '0'* '0'
 fraction   := '0'* '#'*
 exponent   := 'E' '+'? '0'* '0'
 padSpec    := '*' padChar
 padChar    := '\u0000'..'\uFFFD' - quote
  
 Notation:
   X*       0 or more instances of X
   X?       0 or 1 instances of X
   X|Y      either X or Y
   C..D     any character from C up to D, inclusive
   S-T      characters in S, except those in T
dfdl:numberPattern syntax

The first subpattern is for positive numbers. The second (optional) subpattern is for negative numbers.

Not indicated in the BNF syntax above:

Parsing

During parsing, grouping separators are removed from the data.

Formatting

Formatting is guided by several parameters all of which can be specified using a pattern. The following description applies to formats that do not use scientific notation.

Special Values

NaN is represented as a string determined by the dfdl:textStandardNanRep property. This is the only value for which the prefixes and suffixes are not used.

Infinity is represented as a string with the positive or negative prefixes and suffixes applied. The infinity string is determined by the dfdl:textStandardInfinityRep property.

Scientific Notation

Numbers in scientific notation are expressed as the product of a mantissa and a power of ten, for example, 1234 can be expressed as 1.234 x 103. The mantissa is typically in the half-open interval [1.0, 10.0) or sometimes [0.0, 1.0), but it need not be. In a pattern, the exponent character immediately followed by one or more digit characters indicates scientific notation. Example: "0.###E0" formats the number 1234 as "1.234E3".

Padding

Padding may be specified through the pattern syntax. In a pattern the pad escape character, followed by a single pad character, causes padding to be parsed and formatted. The pad escape character is '*'. For example, "*x#,##0.00" formats 123 to "xx123.00", and 1234 to "1,234.00".

Note: This padding is in addition to the normal DFDL text padding.

Rounding

How rounding is controlled is given by dfdl:textNumberRounding. The rounding increment may be specified in the dfdl:textNumberPattern itself using digits '1' through '9' or using an explicit increment in dfdl:textNumberRoundingIncrement. For example, 1230 rounded to the nearest 50 is 1250. 1.234 rounded to the nearest 0.65 is 1.3.

Using an explicit rounding increment, dfdl:textNumberRoundingMode determines how values are rounded.

13.6.1.2 dfdl:textNumberPattern for dfdl:textNumberRep 'zoned'

When dfdl:textNumberRep is ‘zoned’ only the pattern for positive numbers is used. It is a schema definition error if the negative pattern is specified.

Only the following pattern characters may be used:

Rounding occurs as described under Rounding in ‎13.6.1.1 dfdl:textNumberPattern for dfdl:textNumberRep 'standard'.

13.6.2 Converting logical numbers to/from text representation

1 Reference for this CA Realia 0x20 overpunch for negative sign is the article: "EBCDIC to ASCII Conversion of Signed Fields" at http://www.discinterchange.com/TechTalk_signed_fields_.html, where it says:

COBOL compilers that run on ASCII platforms have a "signed" data type that operates in a similar manner to the EBCDIC Signed field -- that is, they over punch the sign on the LSD. However, this is not standardized in ASCII, and different compilers use different overpunch codes.For example, Computer Associates' Realia compiler uses a 30 hex for positive values and a 20 hex for negative values, but Micro Focus® and Microsoft® use 30 hex for positive values and 70 hex for negative values.


Copyright (C) Open Grid Forum (2005-2010). All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the OGF or other organizations, except as needed for the purpose of developing Grid Recommendations in which case the procedures for copyrights defined in the OGF Document process must be followed, or as required to translate it into languages other than English.