Rabbit blowing horn Work the Web

Release Notes for LotusXSL

Document Author: Scott Boag
Document Date: September 2, 1999
Software Version: 0.18.1 [02-September-1999]

1. Disclaimer

LotusXSL is not at this time a product. For the moment the term 'Developer Preview' is the best description of it. It implements a draft standard that is still in design. APIs will continue to change, with no promise of backwards compatibility. This software contains known and unknown bugs (well... I think all software does...). I would recommend against using this software for mission critical applications. Lotus Development Corporation will not take responsibility for any disaster that may ensue from using this software.


2. Introduction

XSL is a language for expressing stylesheets and other types of transformations. It consists of two parts:

  1. XSLT is a language for transforming XML documents into HTML, XML, Text, or other type of document. It is composed of the XML XSLT vocabulary, and XPath, a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer.
  2. An XML vocabulary for specifying formatting semantics (called Formatting Objects).

LotusXSL implements at this time only the first part of XSL, the W3C Working Draft 13 August 1999 XSL Transformations (XSLT).

An XSLT stylesheet specifies the transformation of a class of XML documents by describing how an instance of the class is transformed into another XML tree of nodes.

LotusXSL uses Version 2.0.15 (preferred) or 1.1.16 of IBM's XML for Java (called "XML4J") to parse an input XML document, or it can be adapted to other DOM-producing mechanisms. LotusXSL produces SAX events, an output DOM, or XML result document based on the transformations specified in the XSL stylesheet.

The processor can be used from the command line, from a servlet, or from a wrapper applet.

Version 0.18.0 is a radical rearchitecture of LotusXSL. In addition to support for the new draft, the old QueryEngine has been re-written and broken into an XPath package, the liaisons and formatters have been moved to a com.lotus.xml package, the Stylesheet classes have been separated from the main processor and are reusable and serializable, and the XSL instruction handlers have been moved into separate classes. In addition, we are working on an XML Query API, and a driver interface for data providers. In general, this release should perform much faster.

LotusXSL is now thread-safe with an instance per thread. You can not run the same instance in multiple threads.

Note that stylesheets can now be compiled (into an internal format) and serialized to disk. The resulting disk files are *big* (I mean really big), but this feature might be useful to someone.

See XQuery Overview for information about the XML query architecture.

In the SQL directory is an implementation of a XLocator object that interfaces with JDBC databases to return a canonical row-set XML. This will allow transformation of SQL data to any XML or HTML. The rows try to be lazily evaluated, though I have some more work to do for making this work well.

Code provides base servlet support to apply XSL stylesheets (retrieved from various sources) to XML (also retrieved from various sources). See the DefaultApplyXSL class. (Many thanks to the WebSphere folks for this.)

Stylesheets that worked with the previous version of LotusXSL will have to be upgraded to match the new draft. Watch for stylesheets that declared <xsl:stylesheet> with the xmlns:xsl="http://www.w3.org/TR/WD-xsl" namespace declaration. The namespace should now be xmlns:xsl="http://www.w3.org/XSL/Transform/1.0".

LotusXSL contains a DTD for XSL when used for HTML. (Thanks be to Henrique M. Holschuh for this). This DTD should not be used for run-time production when performance is a concern, but is useful during the development stages. It is used in the readme XSL.

A local copy of the IBM LotusXSL Comercial License Agreement is included.

Special thanks goes to Sanjiva Weerawarana of IBM Research, who has implemented the extension mechanism of LotusXSL, and has helped me in countless other ways with this software.

Thanks goes to Thomas Rowe, Don Day, David Marston, David A Epstein, Noah Mendelsohn, Dean Burson, Joseph Kesselman, Pat O'Connor, Mike Pogue, Chip Faulkner, Paul Dick, Adam Peller, David Bertoni, the XSL WG, and others, for the support, help, and feedback they have given! Also, many thanks for all the folks who have downloaded LotusXSL so far, and given me feedback, whether good or bad, and bug reports! Special thanks to Dan, Henrique, Adam, and Craig!


3. Installation Instructions

  1. Need to run the following command to extract the files

    tar xvf lotusxsl_0_18_1.tar

    These commands create a lotusxsl_0_18_1 sub-directory in the current directory containing all the files.

  2. Make sure to have XML4J on the system class path

To test, you can type run -in test.xml -xsl test.xsl. You should see a bunch of HTML output.

In the LotusXSL root directory there is a 'run' batch file, which you pass command line parameters. These instructions use version 2.0.15 of XML4J with the DOM classes. To use XML4J 1.1.16 you'll need to change the batch files to call com.lotus.xsl.Process or, to use the 2.0.15 version with the TX classes, call com.lotus.xsl.xml4j2tx.XML4JLiaison.

The command line utility is com.lotus.xsl.Process. It can take the following command line switches:

        -IN inputXMLURL
        -XSL XSLTransformationURL
        -OUT outputFileName
        -LXCIN compiledStylesheetFileNameIn
        -LXCOUT compiledStylesheetFileNameOutOut (it will be big!)
        -PARSER fully qualified class name of parser liaison
        -E (Do not expand entity refs)
        -V (Version info)
        -QC (Quiet Pattern Conflicts Warnings)
        -Q  (Quiet Mode)
        -LF (Use linefeeds only on output {default is CR/LF})
        -CR (Use carriage returns only on output {default is CR/LF})
        -ESCAPE (Which characters to escape {default is <>&"'\r\n}
        -INDENT (Control how many spaces to indent {default is 0})
        -TT (Trace the templates as they are being called)
        -TG (Trace each result tree generation event.)
        -TS (Trace each selection event.)
        -TTC (Trace the template children as they are being processed)
        -VALIDATE (Set whether validation occurs.  Validation is off by default)
        -EDUMP (Do stackdump on error)
        -XML (Use XML formatter and add XML header)
        -TEXT (Use simple Text formatter)
        -HTML (Use HTML formatter)
        -PARAM name expression (Set a stylesheet parameter)
      

4. Samples

See the Demonstration Page for some fairly simple examples of producing XML-styled HTML. [NOTE: This requires Java 1.1 and so will not work on earlier versions of browsers that only support Java 1.0.] Most of the XML and XSL files that are used, are in the 'samples' directory. However, probably the best sample is the XSL and XML that produced this file. It is located in the 'readme_production' directory.

I hope to be getting some more work done on better samples over time. If you come up with something cool that you wouldn't mind sharing, please let me know!


5. Test Suite

Our test suite is being moved into a separate download on AlphaWorks, and will no longer be contained in the LotusXSL zip file.


6. Performance Notes

We did a great deal to improve performance in version 0.18.0. However, as these things often go, we didn't quite make our goal for 10x performance increase. This list of what we understand the bottlenecks to be is provided in order that users may understand the issues, and work around the issues in some cases. The bottlenecks seem to be:

Class load and JIT (Just In Time compiler)
As we've added architecture to improve raw transform performance, we've also increased the number classes and instances that need loading, and the amount of code that needs JITing. Although we can shrink the code size a bit, the time to load the processor will continue to be an issue. The one thing the caller of the API can do is to pre-warm the JIT by transforming a dummy document. An example of this technique can be found in the com.lotus.xsl.client.LotusXSLControl class, in the init() function, where I try to pre-warm the JIT so that click response time is reasonable.
Stylesheet Compilation
LotusXSL now reads the stylesheet from the parser's SAX events (unless you give it a DOM, in which case it produces SAX events from the DOM), and builds an internal tree structure. This structure process the attributes, precompiles the XPath expressions, and does other things that can be done before it knows about the input tree. The construction of large stylesheets is not as fast as we would like. We have been focusing on the transformation itself, and will have to concentrate on optimizing the stylesheet compilation down the road a bit.
The DOM
DOM nodes are big. Every node, including attribute nodes and whitespace nodes, have a next, previous, parent, first child, last child, ownerDocument, name, value, userData, and a few internal flags. So building the input DOM can take a lot of memory management. We are working on ways to get around this when LotusXSL builds the input tree, while still maintaining our DOM input architecture.
Union Sorting in Document Order
XSLT has to return nodes in document order. Determining document order from a pure DOM is difficult. We have developed a method to determine the order that is pretty good, but it is not as fast as simply determining document order based on a integer. Also, the collation method of combining two node lists is not as effecient as it needs to be, simply because we haven't had time to address it yet (the issues are more complex than is first apparent). The net of this is that union expressions aren't scalable to large node sets. For instance, we have one user who has a large document with 3800 elements right under the root node, and they apply a standard select="*|@*|comment()|processing-instruction()|text()" expression to these nodes. The 3800 elements happen to have 3800 whitespace nodes interleafed between them, and the union selection has to create a nodelist of 3800 elements and 3800 whitespace nodes, and combine them, testing for document order as it goes. The result is the processing takes forever. We're working on ways to fix this, but, the answer for this user was to not do the union selection at the top level (they didn't care about the whitespace nodes anyway).
Selects in Large Documents
LotusXSL currently builds no inverted indexes for the input document, and does not take advantage of DTD structures to optimize, and otherwise does no special magic to locate nodes other than try to implement efficient algorithms. While inverted indexes and taking advantage of DTD structures are on our list for options, users should be aware that patterns such as '//foo' at the top of the tree can be expensive.
xsl:number and xsl:sort
Both sorting and numbering are not as effecient as they could be.

7. API

Please see the API Overview document for information about the LotusXSL Application Programmer's Interface and the source code.

The API Documentation is in the 'apidocs' directory.


8. Tips


9. Extension Mechanism

See the LotusXSL Extension Mechanism Doc for instructions on how you can extend XSLT with Java and JavaScript.


10. Proprietary Extensions

10.1. Redirect Extension (Multiple Output Docs)

LotusXSL has a built in extension to let you produce multiple output documents from a single stylesheet. The extension is limited right now to producing files, when the output method is "xml", "html", or "text". See Redirect class for details.

11. Version Notes

11.1. Not Yet Implemented

This section will give you an idea of what I haven't implemented yet.

11.2. Known Bugs

Bugs I haven't been able to fix yet for one reason or the other.

11.3. Features and Enhancements To Do

There are lots of potential features that will be going into the draft. Rather than trying to enumerate these, I'll enumerate what I plan to do that is not driven by the draft.

11.4. Changes Since Last Version

September 2, 1999 (Version 0.18.1)

August 28, 1999 (Version 0.18.0d14)

August 26, 1999 (Version 0.18.0d13)

August 24, 1999 (Version 0.18.0d11)

August 19, 1999 (Version 0.18.0d10)

August 16, 1999 (Version 0.18.0d9)

August 08, 1999 (Version 0.18.0d8)

August 02, 1999 (Version 0.18.0d7)

July 18, 1999 (Version 0.18.0d6)

July 14, 1999 (Version 0.18.0d5)

July 14, 1999 (Version 0.18.0d4)

July 13, 1999 (Version 0.18.0d3)

July 12, 1999 (Version 0.18.0d2)

June 24, 1999 (Version 0.17.3)

June 15, 1999 (Version 0.17.2)

May 24, 1999 (Version 0.17.1)

May 3, 1999 (Version 0.17.0)

April 7, 1999 (Version 0.16.4)

April 2, 1999 (Version 0.16.4)

March 22, 1999 (Version 0.16.4)

March 11, 1999

March 11, 1999 (Version 0.16.4d1)

March 11, 1999 (Version 0.16.3)

March 10, 1999

March 8, 1999

March 3, 1999

March 2, 1999

February 28, 1999

February 26, 1999

February 08, 1999 (Version 0.16.2)

February 01, 1999

January 25, 1999

January 21, 1999

January 15, 1999

December 20, 1998

December 17, 1998

December 15, 1998

December 11, 1998

November 28, 1998 to 18-May-1998

  1. Details deleted.

18-May-1998 to 30-June-1998

  1. Old submission version. Details deleted.

12. Contact Info and Bug Reports

  1. Contact information for LotusXSL: Scott Boag

13. Glossary

XSL Instruction
Any tag with an XSL namespace prefix.
XSL Template Instruction
Any tag with an XSL namespace prefix that occurs inside an xsl:template element.
Template Child
Any node that is a child of an xsl:template element.
Source Tree
The tree input to the XSL process.
Result Tree
The tree that is output by the XSL process.
Stylesheet Tree
The stylesheet tree produced from the XSL file.
Pattern List
A parsed query or match pattern.