IBM Books

XML Extender Administration and Programming

Planning for XML collections

When planning for XML collections, you have different considerations for composing documents from DB2 data, decomposing XML document into DB2 data, or both. The following sections address planning issues for XML collections, and address composition and decomposition considerations.

Validation

After you choose an access and storage method, you can determine whether to validate your data. You validate XML data using a DTD. Using a DTD ensures that the XML document is valid and lets you perform structured searches on your XML data. The DTD is stored in the DTD repository.

Recommendation: Validate XML data with a DTD. To validate, you need to have a DTD in the XML Extender repository. To learn how to insert a DTD into the repository, see Storing a DTD in the DTD repository. The DTD requirements differ depending on whether you are composing or decomposing XML documents.

Important: Make the decision whether to validate XML data before inserting XML data into DB2. The XML Extender does not support the validation of data that has already been inserted into DB2.

Considerations:

The DAD file

For XML collections, the DAD file maps the structure of the XML document to the DB2 tables from which you either compose the document, or to where you decompose the document.

For example, if you have an element called <Tax> in your XML document, you might need to map <Tax> to a column called TAX. You define the relationship between the XML data and the relational data in the DAD.

The DAD file is specified either while enabling a collection, or when you use the DAD file in XML collection stored procedures. The DAD is an XML-formatted document, residing at the client. If you choose to validate XML documents with a DTD, the DAD file can be associated with that DTD. When used as the input parameter of the XML Extender stored procedures, the DAD file has a data type of CLOB.

To specify the XML collection access and storage method, you use the following tag in the DAD file:

<Xcollection>
Specifies that the XML data is either to be decomposed from XML documents into a collection of relational tables, or to be composed into XML documents from a collection of relational tables.

An XML collection is a virtual name for a set of relational tables that contains XML data. Applications can enable an XML collection of any user tables. These user tables can be existing tables of legacy business data or tables that the XML Extender recently created. You access XML collection data mainly through the stored procedures that the XML Extender provides.

The DAD file defines the XML document tree structure, using the following kinds of nodes:

root_node
Specifies the root element of the document.

element_node
Identifies an element, while can be the root element or a child element.

text_node
Represents the CDATA text of an element.

attribute_node
Represents an attribute of an element.

Figure 10 shows a fragment of the mapping that is used in a DAD file. The nodes map the XML document content to table columns in a relational table.

Figure 10. Node definitions

<?xml version="1.0"?> 
<!DOCTYPE DAD SYSTEM "c:\dtd\dad.dtd"> 
<DAD> 
  ...
<Xcollection> 
<SQL_stmt> 
       ...
</SQL_stmt> 
<prolog>?xml version="1.0"?</prolog> 
<doctype>!DOCTYPE DAD SYSTEM "c:\dxx\sample\dtd\getstart.dtd"</doctype>
<root_node> 
   <element_node name="Order">         --> Identifies the element <Order>
     <attribute_node name="key">       --> Identifies the attribute "key" 
        <column name="order_key"/>     --> Defines the name of the column, "order_key", 
                                                to which the element and attribute are mapped
     </attribute_node> 
     <element_node name="Customer">    --> Identifies a child element of <Order> as 
                                                <Customer>
        <text_node>                    --> Specifies the CDATA text for the element 
                                                <Customer>
          <column name="customer">     --> Defines the name of the column, "customer", 
                                                to which the child element is mapped
        </text_node> 
     </element_node> 
        ...
   </element_node>
 
      ...
<root_node> 
</Xcollection>
</DAD>
 

In this example, the first two columns in the SQL statement have elements and attributes mapped to them.

You can use the XML Extender administration wizard or an editor to create and update the DAD file.

Mapping schemes for XML collections

If you are using an XML collection, you must select a mapping scheme that defines how XML data is represented in a relational database. Because XML collections must match a hierarchical structure that is used in XML documents with a relational structure, you should understand how the two structures compare. Figure 11 shows how the hierarchical structure can be mapped to relational table columns.

Figure 11. XML document structured mapped to relational table columns


XML document structured mapped to relational table columns

The XML Extender uses the mapping scheme when composing or decomposing XML documents that are located in multiple relational tables. The XML Extender provides a wizard that assists you in creating the DAD file. However, before you create the DAD file, you must think about how your XML data is mapped to the XML collection.

Types of mapping schemes

The mapping scheme is specified in the <Xcollection> element in the DAD file. The XML Extender provides two types of mapping schemes: SQL mapping and Relational Database (RDB_node) mapping. Both methods use the XSLT model to define the hierarchy of the XML document.

SQL mapping
Allows simple and direct mapping from relational data to XML documents through a single SQL statement and the XSLT data model. SQL mapping is used for composition; it is not used for decomposition. SQL mapping is defined with the SQL_stmt element in the DAD file. The content of the SQL_stmt is a valid SQL statement. The SQL_stmt maps the columns in the SELECT clause to XML elements or attributes that are used in the XML document. When defined for composing XML documents, the column names in the SQL statement's SELECT clause are used to define the value of an attribute_node or a content of text_node. The FROM clause defines the tables containing the data; the WHERE clause specifies the join and search condition.

The SQL mapping gives DB2 users the power to map the data using SQL. When using SQL mapping, you must be able to join all tables in one SELECT statement to form a query. If one SQL statement is not sufficient, consider using RDB_node mapping. To tie all tables together, the primary key and foreign key relationship is recommended among these tables.

RDB_node mapping
Defines the location of the content of an XML element or the value of an XML attribute so that the XML Extender can determine where to store or retrieve the XML data.

The RDB_node contains one or more node definitions for tables, optional columns, and optional conditions. The tables and columns are used to define how the XML data is to be stored in the database. The condition specifies the criteria for selecting XML data or the way to join the XML collection tables.

To define a mapping scheme, you create a DAD with an <Xcollection> element. Figure 12 shows a fragment of a sample DAD file with an XML collection SQL mapping that composes a set of XML documents from data in three relational tables.

Figure 12. SQL mapping scheme

<?xml version="1.0"?> 
<!DOCTYPE DAD SYSTEM "c:\dtd\dad.dtd"> 
<DAD> 
  <dtdid>c:\dxx\samples\dad\getstart.dtd</dtdid> 
  <validation>YES</validation> 
  <Xcollection> 
    <SQL_stmt> 
       SELECT o.order_key, customer, p.part_key, quantity, price, tax, date, 
              mode, comment 
         FROM order_tab o, part_tab p, 
               table(select substr(char(timestamp(generate_unique())), 
               as ship_id, date, mode, from ship_tab) as s 
        WHERE p.price > 2500.00 and s.date > "1996-06-01" AND 
              p.order_key = o.order_key and s.part_key = p.part_key 
    </SQL_stmt>
    <prolog>?xml version="1.0"?</prolog>
    <doctype>!DOCTYPE DAD SYSTEM "c:\dxx\samples\dtd\getstart.dtd"</doctype>
    <root_node>
	   <element_node name="Order">
		  <attribute_node name="key">
			<column_name="order_key"/>
		  </attribute_node>
		  <element_node name="Customer">
			  <text_node>
				  <column name="customer"/>
			  </text_node>
		  <element_node>
		
		...
 
		  </element_node><!-end Part->
	   </element_node><!-end Order->
    </root_node>
  </Xcollection>
</DAD>
 

The XML Extender provides several stored procedures that manage data in an XML collection. These stored procedures support both types of mapping, but require that the DAD file follow the rules that are described in Mapping scheme requirements.

Mapping scheme requirements

The following sections describe requirements for each type of the XML collection mapping schemes.

Requirements when using SQL mapping

In this mapping scheme, you must specify the SQL_stmt element in the DAD <Xcollection> element. The SQL_stmt should contain a single SQL statement that can join multiple relational tables with the query predicate. In addition, the following clauses are required:

Although the SQL_stmt has the preceding requirements, it is powerful because you can specify any predicate in your WHERE clause, as long as the expression in the predicate uses the columns in the tables.

Requirements when using RDB_node mapping

When using this mapping method, do not use the element SQL_stmt in the <Xcollection> element of the DAD file. Instead, use the RDB_node element in each of the top nodes for element_node and for each attribute_node and text_node.

With the RDB_node mapping approach, you don't need to supply SQL statements. However, putting complex query conditions in the RDB_node element can be more difficult. For example, using a union, expression, or operation is somewhat less powerful than the SQL-to-XML approach.

Decomposition table size requirements

Decomposition uses RDB_node mapping to specify how an XML document is decomposed into DB2 tables by extracting the element and attribute values into table rows. The values from each XML document are stored in one or more DB2 tables. Each table can have a maximum of 1024 rows decomposed from each document.

For example, if an XML document is decomposed into five tables, each of the five tables can have up to 1024 rows for that particular document. If the table has rows for multiple documents, it can have up to 1024 rows for each document. If the table has 20 documents, it can have 20,480 rows, 1024 for each document.

Using multiple-occuring elements (elements with location paths that can occur more than once in the XML structure) affects the number of rows. For example, a document that contains an element <Part> that occurs 20 times, might be decomposed as 20 rows in a table. When using multiple occuring elements, consider this table size restriction.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]