com.ibm.icu.text
Class Collator

java.lang.Object
  |
  +--com.ibm.icu.text.Collator
All Implemented Interfaces:
java.lang.Cloneable, java.util.Comparator
Direct Known Subclasses:
RuleBasedCollator

public abstract class Collator
extends java.lang.Object
implements java.util.Comparator, java.lang.Cloneable

Collator performs locale-sensitive string comparison. A concrete subclass, RuleBasedCollator, allows customization of the collation ordering by the use of rule sets.

Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 5 different levels of strength used in comparisons:

Unlike the JDK, ICU4J's Collator deals only with 2 decomposition modes, the canonical decomposition mode and one that does not use any decomposition. The compatibility decomposition mode, java.text.Collator.FULL_DECOMPOSITION is not supported here. If the canonical decomposition mode is set, the Collator handles un-normalized text properly, producing the same results as if the text were normalized in NFD. If canonical decomposition is turned off, it is the user's responsibility to ensure that all text is already in the appropriate form before performing a comparison or before getting a CollationKey.

For more information about the collation service see the users guide.

Examples of use

 // Get the Collator for US English and set its strength to PRIMARY
 Collator usCollator = Collator.getInstance(Locale.US);
 usCollator.setStrength(Collator.PRIMARY);
 if (usCollator.compare("abc", "ABC") == 0) {
     System.out.println("Strings are equivalent");
 }
 
 The following example shows how to compare two strings using the
 Collator for the default locale.

 // Compare two strings in the default locale
 Collator myCollator = Collator.getInstance();
 myCollator.setDecomposition(NO_DECOMPOSITION);
 if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
     System.out.println("à\u0325 is not equals to a\u0325̀ without decomposition");
     myCollator.setDecomposition(CANONICAL_DECOMPOSITION);
     if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
         System.out.println("Error: à\u0325 should be equals to a\u0325̀ with decomposition");
     }
     else {
         System.out.println("à\u0325 is equals to a\u0325̀ with decomposition");
     }
 }
 else {
     System.out.println("Error: à\u0325 should be not equals to a\u0325̀ without decomposition");
 }
 

Since:
release 2.2, April 18 2002
Author:
Syn Wee Quek
See Also:
RuleBasedCollator, CollationKey

Field Summary
static int CANONICAL_DECOMPOSITION
          Decomposition mode value.
static int IDENTICAL
           Smallest Collator strength value.
static int NO_DECOMPOSITION
          Decomposition mode value.
static int PRIMARY
          Strongest collator strength value.
static int QUATERNARY
          Fourth level collator strength value.
static int SECONDARY
          Second level collator strength value.
static int TERTIARY
          Third level collator strength value.
 
Constructor Summary
Collator()
           
 
Method Summary
 int compare(java.lang.Object source, java.lang.Object target)
           Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode.
abstract  int compare(java.lang.String source, java.lang.String target)
           Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode.
abstract  boolean equals(java.lang.Object that)
          Compares the equality of two Collators.
 boolean equals(java.lang.String source, java.lang.String target)
          Convenience method for comparing the equality of two text Strings using this Collator's rules, strength and decomposition mode.
abstract  CollationKey getCollationKey(java.lang.String source)
           Transforms the String into a CollationKey suitable for efficient repeated comparison.
 int getDecomposition()
           Get the decomposition mode of this Collator.
static Collator getInstance()
          Gets the Collator for the current default locale.
static Collator getInstance(java.util.Locale locale)
          Gets the Collator for the desired locale.
 int getStrength()
          Returns this Collator's strength property.
abstract  int hashCode()
          Generates a unique hash code for this Collator.
 void setDecomposition(int decomposition)
          Set the decomposition mode of this Collator.
 void setStrength(int newStrength)
          Sets this Collator's strength property.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PRIMARY

public static final int PRIMARY
Strongest collator strength value. Typically used to denote differences between base characters. See class documentation for more explanation.
See Also:
setStrength(int), getStrength()

SECONDARY

public static final int SECONDARY
Second level collator strength value. Accents in the characters are considered secondary differences. Other differences between letters can also be considered secondary differences, depending on the language. See class documentation for more explanation.
See Also:
setStrength(int), getStrength()

TERTIARY

public static final int TERTIARY
Third level collator strength value. Upper and lower case differences in characters are distinguished at this strength level. In addition, a variant of a letter differs from the base form on the tertiary level. See class documentation for more explanation.
See Also:
setStrength(int), getStrength()

QUATERNARY

public static final int QUATERNARY
Fourth level collator strength value. When punctuation is ignored (see Ignoring Punctuations in the user guide) at PRIMARY to TERTIARY strength, an additional strength level can be used to distinguish words with and without punctuation. See class documentation for more explanation.
See Also:
setStrength(int), getStrength()

IDENTICAL

public static final int IDENTICAL

Smallest Collator strength value. When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. See class documentation for more explanation.

Note this value is different from JDK's


NO_DECOMPOSITION

public static final int NO_DECOMPOSITION

Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be decomposed for collation. This is the default decomposition setting unless otherwise specified by the locale used to create the Collator.

Note this value is different from the JDK's.

See Also:
CANONICAL_DECOMPOSITION, getDecomposition(), setDecomposition(int)

CANONICAL_DECOMPOSITION

public static final int CANONICAL_DECOMPOSITION

Decomposition mode value. With CANONICAL_DECOMPOSITION set, characters that are canonical variants according to the Unicode standard will be decomposed for collation.

CANONICAL_DECOMPOSITION corresponds to Normalization Form D as described in Unicode Technical Report #15.

See Also:
NO_DECOMPOSITION, getDecomposition(), setDecomposition(int)
Constructor Detail

Collator

public Collator()
Method Detail

setStrength

public void setStrength(int newStrength)

Sets this Collator's strength property. The strength property determines the minimum level of difference considered significant during comparison.

The default strength for the Collator is TERTIARY, unless specified otherwise by the locale used to create the Collator.

See the Collator class description for an example of use.

Parameters:
new - Strength the new strength value.
Throws:
java.lang.IllegalArgumentException - if the new strength value is not one of PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.
See Also:
getStrength(), PRIMARY, SECONDARY, TERTIARY, QUATERNARY, IDENTICAL

setDecomposition

public void setDecomposition(int decomposition)

Set the decomposition mode of this Collator. Setting this decomposition property with CANONICAL_DECOMPOSITION allows the Collator to handle un-normalized text properly, producing the same results as if the text were normalized. If NO_DECOMPOSITION is set, it is the user's responsibility to insure that all text is already in the appropriate form before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user to select between faster and more complete collation behavior.

Since a great many of the world's languages do not require text normalization, most locales set NO_DECOMPOSITION as the default decomposition mode.

The default decompositon mode for the Collator is NO_DECOMPOSITON, unless specified otherwise by the locale used to create the Collator.

See getDecomposition for a description of decomposition mode.

Parameters:
decomposition - the new decomposition mode
Throws:
java.lang.IllegalArgumentException - If the given value is not a valid decomposition mode.
See Also:
getDecomposition(), NO_DECOMPOSITION, CANONICAL_DECOMPOSITION

getInstance

public static final Collator getInstance()
Gets the Collator for the current default locale. The default locale is determined by java.util.Locale.getDefault().
Returns:
the Collator for the default locale (for example, en_US) if it is created successfully. Otherwise if there is no Collator associated with the current locale, the default UCA collator will be returned.
See Also:
Locale.getDefault(), getInstance(Locale)

getInstance

public static final Collator getInstance(java.util.Locale locale)
Gets the Collator for the desired locale.
Parameters:
locale - the desired locale.
Returns:
Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, a default UCA collator will be returned.
See Also:
Locale, ResourceBundle, getInstance()

getStrength

public int getStrength()

Returns this Collator's strength property. The strength property determines the minimum level of difference considered significant.

See the Collator class description for more details.

Returns:
this Collator's current strength property.
See Also:
setStrength(int), PRIMARY, SECONDARY, TERTIARY, QUATERNARY, IDENTICAL

getDecomposition

public int getDecomposition()

Get the decomposition mode of this Collator. Decomposition mode determines how Unicode composed characters are handled.

See the Collator class description for more details.

Returns:
the decomposition mode
See Also:
setDecomposition(int), NO_DECOMPOSITION, CANONICAL_DECOMPOSITION

compare

public int compare(java.lang.Object source,
                   java.lang.Object target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.

Specified by:
compare in interface java.util.Comparator
Parameters:
source - the source String.
target - the target String.
Returns:
Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.
Throws:
NullPointerException - thrown if either arguments is null. IllegalArgumentException thrown if either source or target is not of the class String.
See Also:
CollationKey, getCollationKey(java.lang.String)

equals

public boolean equals(java.lang.String source,
                      java.lang.String target)
Convenience method for comparing the equality of two text Strings using this Collator's rules, strength and decomposition mode.
Parameters:
source - the source string to be compared.
target - the target string to be compared.
Returns:
true if the strings are equal according to the collation rules, otherwise false.
Throws:
NullPointerException - thrown if either arguments is null.
See Also:
compare(java.lang.Object, java.lang.Object)

equals

public abstract boolean equals(java.lang.Object that)
Compares the equality of two Collators.
Specified by:
equals in interface java.util.Comparator
Overrides:
equals in class java.lang.Object
Parameters:
that - the Collator to be compared with this.
Returns:
true if this Collator is the same as that Collator; false otherwise.

hashCode

public abstract int hashCode()
Generates a unique hash code for this Collator.
Overrides:
hashCode in class java.lang.Object
Returns:
32 bit unique hash code

compare

public abstract int compare(java.lang.String source,
                            java.lang.String target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.

Parameters:
source - the source String.
target - the target String.
Returns:
Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.
Throws:
NullPointerException - thrown if either arguments is null.
See Also:
CollationKey, getCollationKey(java.lang.String)

getCollationKey

public abstract CollationKey getCollationKey(java.lang.String source)

Transforms the String into a CollationKey suitable for efficient repeated comparison. The resulting key depends on the collator's rules, strength and decomposition mode.

See the CollationKey class documentation for more information.

Parameters:
source - the string to be transformed into a CollationKey.
Returns:
the CollationKey for the given String based on this Collator's collation rules. If the source String is null, a null CollationKey is returned.
See Also:
CollationKey, compare(String, String)


Copyright (c) 2002 IBM Corporation and others.