XSLT stylesheets and expressions in XQuery and XPath can
refer to collations using collation URIs. A collation is a set of
culture-specific rules that define how text should be sorted and which
differences between two pieces of text are considered significant
and which insignificant.
Before you begin
This article assumes some basic familiarity with the java.util.Locale
and java.text.Collator classes.
About this task
The processor does not interpret the collation URI in
any way -- it treats a collation URI merely as a sort of name for
the instance of the Java Collator
class that is associated with that URI. The XML API provides mechanisms
for specifying what will be the default collation URI at preparation-time
and for associating an instance of the Java Collator
class with a collation URI at execution-time.
All collation
URIs specified through the XML API must be absolute URI references.
In an XSLT stylesheet or an XQuery or XPath expression, any relative
URI reference that is used in a context where a collation URI is required
will be resolved against the base URI from the static context for
that expression -- that will ensure that even relative URI references
in the stylesheet or expression can be matched with the absolute URI
references specified through the XML API.
Limitations: - If a collation URI is bound with an instance of the Java Collator class that is not an instance
of java.text.RuleBasedCollator, certain operations will not be permitted
with that collation URI. In particular, the fn:starts-with, fn:ends-with,
fn:contains, fn:substring-before and fn:substring-after functions
are not supported with that collation URI.
- All instances of Collator that are currently included with the Java runtime environment are also
instances of java.text.RuleBasedCollator, so this is for most purposes
only a theoretical limitation. However, it is something to be aware
of if an application defines its own instances of the Java Collator class or defines subclasses the
Collator class that are not also instances of java.text.RuleBasedCollator.
Procedure
- Declare the default collation URI.
You can
specify what collation URI you want to use as the default for string
comparison operations by using the method setDefaultCollation method
on the XStaticContext interface. The default collation URI from the
XStaticContext interface will be used as the collation URI in string
comparison operations that do not explicitly specify a collation URI.
An XQuery expression can override the default collation URI
specified on the XStaticContext interface with the declare default
collation declaration. Similarly, an XSLT stylesheet can override
the default collation URI with the [xsl:]default-collation attribute.
XPath does not provide a means of overriding the default collation
URI. However, any XPath or XQuery expression or XSLT stylesheet that
performs string comparison operations can specify an explicit collation
URI to override the default collation URI.
If you do not explicitly
specify a default collation on any instance of the XStaticContext
interface you supply when you prepare your XSLT stylesheet or your
XQuery or XPath expression, the default collation URI for the stylesheet
or expression will be the Unicode code-point collation URI: http://www.w3.org/2005/xpath-functions/collation/codepoint/.
You
can use the Unicode code-point collation in situations where characters
must be identical Unicode characters to be considered to be equal.
The lexicographical ordering defined by this collation is determined
by the Unicode code points of the characters -- that is, by their
positions on the Unicode code charts. As such, using the Unicode code-point
collation will yield much better performance than collations that
perform string comparisons in a culture-specific manner, but its unlikely
to give very satisfactory results for sorting operations.
The following is a simple example showing how to specify
the default collation URI on an instance of the XStaticContext interface.
// Setting of default collation URI is not changed - default remains
// the Unicode code point collation URI
XFactory factory = XFactory.newInstance();
XPathExecutable maxPath1 = factory.prepareXPath("max($var)");
// A new default collation URI is specified in the static context
// That URI is used in any string comparison for which no other
// explicit collation URI is specified
XStaticContext sc = factory.newStaticContext();
sc.setDefaultCollation("http://example.org/my-collation");
XPathExecutable maxPath2 = factory.prepareXPath("max($var)", sc);
- Bind a collation URI.
The XML API provides
two methods for binding collation URI with an instance of the Java Collator class for an execution.
The bindCollation methods on the XDynamicContext method have two arguments:
the first argument is a collation URI; the second is either instance
of the java.text.Collator class or an instance of the java.util.Locale
class. If an instance of the locale class is specified, the processor
will use the instance of the Collator class that is appropriate for
that locale.
XSLT, XPath and XQuery define the concept of "Statically
Known Collations". If a reference to a collation URI appears in
an XSLT stylesheet or an XPath or XQuery expression, and the collation
URI is not one of the Statically Known Collations, a static error
is supposed to be reported in some circumstances. However, the processor
treats all collation URIs as if they were in the set of Statically
Known Collations. This is due to the fact that instances of the Java Collator class are not actually
associated with collation URIs until execution time, so it is not
possible for the processor to determine statically which collation
URIs are not known. Instead, the processor will report a dynamic error
if a collation URI that is not bound to an instance of the Collator
class is used in a stylesheet or expression.
You
cannot bind the Unicode code-point collation URI to any instance of
the Java Collator class. It
is always implicitly bound with the Unicode code-point collation.
The following example demonstrates how you can bind a
collation URI with a specific instance of the Java Collator class on an instance of the XDynamicContext
interface.
XFactory factory = XFactory.newInstance();
XStaticContext sc = factory.newStaticContext();
// Set up a default collation URI
sc.setDefaultCollation("http://example.org/my-collation");
// Prepare an XPath expression that computes fn:max() using the
// collator associated with the default collation URI and again using
// the Unicode code point collation
String expr =
"max($var)," +
"max($var,'http://www.w3.org/2005/xpath-functions/collation/codepoint')";
XPathExecutable maxPath =
factory.prepareXPath(expr, sc);
XDynamicContext dc = factory.newDynamicContext();
// Set the value of the variable $var
dc.bind(new QName("var"),
new String[] {"encyclopaedia",
// U+00E6 is lower case latin ae ligature
"encyclop\u00E6dia",
"encyclopedia"});
// Set up a Collator for English that does not distinguish between
// capitals, lower-case letters and certain character variants
Collator english =
(Collator) Collator.getInstance(Locale.ENGLISH).clone();
english.setStrength(Collator.SECONDARY);
// Evaluate the expression with that English collator associated with
// the default collation URI
dc.bindCollation("http://example.org/my-collation", english);
XSequenceCursor maxValues = maxPath.execute(dc);
// Print maximum values - expected results are
// encyclopedia for English collation and
// encyclop\u00E6dia for Unicode code point collation
if (maxValues != null) {
do {
System.out.println(maxValues.getStringValue());
} while (maxValues.toNext());
}