XML parsers and domains

Start of changeYou can use the XML domains to model messages that conform to the W3C XML standard.End of change

Start of changeYou can create message models to represent XML messages using one of the XML domains (XMLNSC, XMLNS, or XML); see The XML domains. Messages in these domains are processed by the XML parsers.End of change

Start of changeWebSphere Message Broker uses the XML parsers to read and write XML messages that belong to the XMLNSC, XMLNS, or XML domains, without using a message model. When reading an XML message, the XML parsers build a message tree from an input bit stream. The input bit stream must be a well-formed XML document that conforms to the W3C XML Specification (version 1.0 or 1.1). When writing a message, the XML parsers create an XML bit stream from a message tree. The XML parsers are programmatic and never use a message model at run time, but it is good practice to create and use a message model for design time purposes because it can simplify the construction of your message flow application; see Why model messages?.End of change

For details of how the XML parser handles null elements, see The XML parser and null values.

The information that is provided with WebSphere Message Broker provides a summary of XML terminology, concepts, and message constructs that highlights aspects that are important when you use XML messages with brokers and message flows. For further information about XML, see the developerWorks Web site.

Example XML message parsing

The name elements that are used in this description (for example, XmlDecl) are provided by WebSphere Message Broker, and are referred to as correlation names. They are available for symbolic use within the ESQL that defines the processing of message content that is performed by the nodes, such as a Compute or Filter node, within a message flow. The correlation names are not part of the XML specification. Each XML parser defines its own set of correlation names because the handling of XML content varies.

The correlation names for XML name elements (for example, Element and XmlDecl) equate to a constant value of the form 0x01000000. You can see these constants used in the output that is created by the Trace node when a message, or a portion of the message, is traced.

A simple XML message might take the following form:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE Envelope
PUBLIC "http://www.ibm.com/dtds" "example.dtd"
[<!ENTITY Example_ID "ST_TimeoutNodes Timeout Request Input Test Message">]
>
<Envelope version="1.0">
	<Header>
		<Example>&Example_ID;</Example>
		<!-- This is a comment  -->
	</Header>
	<Body  version="1.0">
		<Element01>Value01</Element01>
		<Element02/>
		<Element03>
			<Repeated>ValueA</Repeated>
			<Repeated>ValueB</Repeated>
		</Element03>
		<Element04><P>This is <B>bold</B> text</P></Element04>
	</Body>
</Envelope>

The following sections show the output that is created by the Trace node when the above message has been parsed in the XML and XMLNSC parsers to demonstrate the differences in the internal structures that are used to represent the data as it is processed by the broker.

Example XML Message parsed in the XML domain

In the following example, the WhiteSpace elements within the tree are present because of the space, tab, and line breaks that format the original XML document; for clarity, the actual characters in the trace have been replaced with "WhiteSpace". WhiteSpace within an XML element does have business meaning and is represented using the Content syntax element. The XmlDecl, DTD, and comments are represented in the XML domain using explicit correlation named syntax elements.

(0x01000010):XML        = (
    (0x05000018):XML      = (
      (0x06000011): = '1.0'
      (0x06000012): = 'UTF-8'
      (0x06000014): = 'no'
    )
    (0x06000002):         = 'WhiteSpace'
    (0x05000020):Envelope = (
      (0x06000004): = 'http://www.ibm.com/dtds'
      (0x06000008): = 'example.dtd'
      (0x05000021): = (
        (0x05000011):Example_ID = (
          (0x06000041): = 'ST_TimeoutNodes Timeout Request Input Test Message'
        )
      )
    )
    (0x06000002):         = 'WhiteSpace'
    (0x01000000):Envelope = (
      (0x03000000):version = '1.0'
      (0x02000000):        = 'WhiteSpace'
      (0x01000000):Header  = (
        (0x02000000):        = 'WhiteSpace'
        (0x01000000):Example = (
          (0x06000020): = 'Example_ID'
          (0x02000000): = 'ST_TimeoutNodes Timeout Request Input Test Message'
          (0x06000021): = 'Example_ID'
        )
        (0x02000000):        = 'WhiteSpace'
        (0x06000018):        = ' This is a comment  '
        (0x02000000):        = 'WhiteSpace'
      )
      (0x02000000):        = 'WhiteSpace'
      (0x01000000):Body    = (
        (0x03000000):version   = '1.0'
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element01 = (
          (0x02000000): = 'Value01'
        )
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element02 = 
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element03 = (
          (0x02000000):         = 'WhiteSpace'
          (0x01000000):Repeated = (
            (0x02000000): = 'ValueA'
          )
          (0x02000000):         = 'WhiteSpace'
          (0x01000000):Repeated = (
            (0x02000000): = 'ValueB'
          )
          (0x02000000):         = 'WhiteSpace'
        )
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element04 = (
          (0x01000000):P = (
            (0x02000000):  = 'This is '
            (0x01000000):B = (
              (0x02000000): = 'bold'
            )
            (0x02000000):  = ' text'
          )
        )
        (0x02000000):          = 'WhiteSpace'
      )
      (0x02000000):        = 'WhiteSpace'
    )

Example XML Message parsed in the XMLNSC domain

The following trace shows the elements that are created to represent the same XML structure within the compact XMLNSC parser in its default mode. In this mode, the compact parser does not retain comments, processing instructions, or mixed text.

The example illustrates the significant saving in the number of syntax elements that are used to represent the same business content of the example XML message when using the compact parser.

By not retaining mixed text, all of the WhiteSpace elements that have no business data content are no longer taking any runtime foot print in the broker message tree. However, the mixed text in Element04.P is also discarded, and only the value of the child folder, Element04.P.B, is held in the tree; the text "This is " and "text" in P is discarded. This type of XML structure is not typically associated with business data formats, therefore use of the compact XMLNSC parser is typically desirable. However, should you need this type of processing, either do not use the XMLNSC parser, or use it with the Retain mixed text mode enabled.

The handling of the XML declaration is also different in the compact parser with the version, encoding, and standalone attributes being held as children of the XmlDeclaration, rather than special correlation named elements.

(0x01000000):XMLNSC     = (
    (0x01000400):XmlDeclaration = (
      (0x03000100):Version    = '1.0'
      (0x03000100):Encoding   = 'UTF-8'
      (0x03000100):StandAlone = 'no'
    )
    (0x01000000):Envelope       = (
      (0x03000100):version = '1.0'
      (0x01000000):Header  = (
        (0x03000000):Example = 'ST_TimeoutNodes Timeout Request Input Test Message'
      )
      (0x01000000):Body    = (
        (0x03000100):version   = '1.0'
        (0x03000000):Element01 = 'Value01'
        (0x01000000):Element02 = 
        (0x01000000):Element03 = (
          (0x03000000):Repeated = 'ValueA'
          (0x03000000):Repeated = 'ValueB'
        )
        (0x01000000):Element04 = (
          (0x01000000):P = (
            (0x03000000):B = 'bold'
          )
        )
   )
Most of the samples in the Samples Gallery use the XML parser to process messages; in particular, see:
Related reference
Built-in nodes
XML constructs