Upgrading Enterprise COBOL Version 4 Release 1 programs that have XML PARSE statements and that use the XMLPARSE(XMLSS) compiler option

There are differences in XML PARSE behavior with the XMLPARSE(XMLSS) compiler option in effect between Enterprise COBOL Version 4 Release 1 and Enterprise COBOL Version 4 Release 2 or later. In Enterprise COBOL Version 4 Release 1 when you parsed an XML document using the XMLPARSE(XMLSS) compiler option and it contained character references that could not be expressed in the encoding of the document, the result was a single ATTRIBUTE-CHARACTERS or CONTENT-CHARACTERS XML event in which every unrepresentable character reference was replaced by a hyphen-minus. No indication was given to the program that the substitution occurred.

For example, parsing the content of the following XML element:
<elem>abc&#x1234;xyz</elem>
under Enterprise COBOL Version 4 Release 1 with encoding CCSID 1140 and with the XMLPARSE(XMLSS) compiler option in effect, resulted in a single CONTENT-CHARACTERS XML event with special register XML-TEXT containing the (EBCDIC) string:
abc-xyz
and with special register XML-CODE containing zero.

In Enterprise COBOL Version 4 Release 2 and later, when you parse an XML document using the XMLPARSE(XMLSS) compiler option, instead of a single ATTRIBUTE-CHARACTERS or CONTENT-CHARACTERS event, multiple XML events occur. Each unrepresentable character reference previously replaced by a hyphen-minus is instead expressed as an ATTRIBUTE-NATIONAL-CHARACTER or CONTENT-NATIONAL-CHARACTER XML event, depending on the context in which it occurred. These are XML events for the XMLPARSE(XMLSS) compiler option.

Parsing the content of the XML element from before:
<elem>abc&#x1234;xyz</elem>
under Enterprise COBOL Version 4 Release 2 and later results in the following sequence of XML events:
  • CONTENT-CHARACTERS with XML-TEXT containing abc
  • CONTENT-NATIONAL-CHARACTER with XML-NTEXT containing NX'1234'
  • CONTENT-CHARACTERS with XML-TEXT containing xyz