www.alphaworks.ibm.comwww.ibm.com/developerwww.ibm.com

Home

Readme
Download

Build







Migration

Releases

Feedback

Y2K Compliance


CVS Repository
Mail Archive

API Docs for SAX and DOM
 

Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members  

XMLString.hpp

Go to the documentation of this file.
00001 /*
00002  * The Apache Software License, Version 1.1
00003  *
00004  * Copyright (c) 1999-2001 The Apache Software Foundation.  All rights
00005  * reserved.
00006  *
00007  * Redistribution and use in source and binary forms, with or without
00008  * modification, are permitted provided that the following conditions
00009  * are met:
00010  *
00011  * 1. Redistributions of source code must retain the above copyright
00012  *    notice, this list of conditions and the following disclaimer.
00013  *
00014  * 2. Redistributions in binary form must reproduce the above copyright
00015  *    notice, this list of conditions and the following disclaimer in
00016  *    the documentation and/or other materials provided with the
00017  *    distribution.
00018  *
00019  * 3. The end-user documentation included with the redistribution,
00020  *    if any, must include the following acknowledgment:
00021  *       "This product includes software developed by the
00022  *        Apache Software Foundation (http://www.apache.org/)."
00023  *    Alternately, this acknowledgment may appear in the software itself,
00024  *    if and wherever such third-party acknowledgments normally appear.
00025  *
00026  * 4. The names "Xerces" and "Apache Software Foundation" must
00027  *    not be used to endorse or promote products derived from this
00028  *    software without prior written permission. For written
00029  *    permission, please contact apache\@apache.org.
00030  *
00031  * 5. Products derived from this software may not be called "Apache",
00032  *    nor may "Apache" appear in their name, without prior written
00033  *    permission of the Apache Software Foundation.
00034  *
00035  * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
00036  * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
00037  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
00038  * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
00039  * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
00040  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
00041  * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
00042  * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
00043  * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
00044  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
00045  * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
00046  * SUCH DAMAGE.
00047  * ====================================================================
00048  *
00049  * This software consists of voluntary contributions made by many
00050  * individuals on behalf of the Apache Software Foundation, and was
00051  * originally based on software copyright (c) 1999, International
00052  * Business Machines, Inc., http://www.ibm.com .  For more information
00053  * on the Apache Software Foundation, please see
00054  * <http://www.apache.org/>.
00055  */
00056 
00057 /*
00058  * $Log: XMLString.hpp,v $
00059  * Revision 1.23  2001/06/13 14:07:55  peiyongz
00060  * isValidaEncName() to validate an encoding name (EncName)
00061  *
00062  * Revision 1.22  2001/05/23 15:44:51  tng
00063  * Schema: NormalizedString fix.  By Pei Yong Zhang.
00064  *
00065  * Revision 1.21  2001/05/11 13:26:31  tng
00066  * Copyright update.
00067  *
00068  * Revision 1.20  2001/05/09 18:43:30  tng
00069  * Add StringDatatypeValidator and BooleanDatatypeValidator.  By Pei Yong Zhang.
00070  *
00071  * Revision 1.19  2001/05/03 20:34:35  tng
00072  * Schema: SchemaValidator update
00073  *
00074  * Revision 1.18  2001/05/03 19:17:35  knoaman
00075  * TraverseSchema Part II.
00076  *
00077  * Revision 1.17  2001/03/21 21:56:13  tng
00078  * Schema: Add Schema Grammar, Schema Validator, and split the DTDValidator into DTDValidator, DTDScanner, and DTDGrammar.
00079  *
00080  * Revision 1.16  2001/03/02 20:52:46  knoaman
00081  * Schema: Regular expression - misc. updates for error messages,
00082  * and additions of new functions to XMLString class.
00083  *
00084  * Revision 1.15  2001/01/15 21:26:34  tng
00085  * Performance Patches by David Bertoni.
00086  *
00087  * Details: (see xerces-c-dev mailing Jan 14)
00088  * XMLRecognizer.cpp: the internal encoding string XMLUni::fgXMLChEncodingString
00089  * was going through this function numerous times.  As a result, the top hot-spot
00090  * for the parse was _wcsicmp().  The real problem is that the Microsofts wide string
00091  * functions are unbelievably slow.  For things like encodings, it might be
00092  * better to use a special comparison function that only considers a-z and
00093  * A-Z as characters with case.  This works since the character set for
00094  * encodings is limit to printable ASCII characters.
00095  *
00096  *  XMLScanner2.cpp: This also has some case-sensitive vs. insensitive compares.
00097  * They are also much faster.  The other tweak is to only make a copy of an attribute
00098  * string if it needs to be split.  And then, the strategy is to try to use a
00099  * stack-based buffer, rather than a dynamically-allocated one.
00100  *
00101  * SAX2XMLReaderImpl.cpp: Again, more case-sensitive vs. insensitive comparisons.
00102  *
00103  * KVStringPair.cpp & hpp: By storing the size of the allocation, the storage can
00104  * likely be re-used many times, cutting down on dynamic memory allocations.
00105  *
00106  * XMLString.hpp: a more efficient implementation of stringLen().
00107  *
00108  * DTDValidator.cpp: another case of using a stack-based buffer when possible
00109  *
00110  * These patches made a big difference in parse time in some of our test
00111  * files, especially the ones are very attribute-heavy.
00112  *
00113  * Revision 1.14  2000/10/13 22:47:57  andyh
00114  * Fix bug (failure to null-terminate result) in XMLString::trim().
00115  * Patch contributed by Nadav Aharoni
00116  *
00117  * Revision 1.13  2000/04/12 18:42:15  roddey
00118  * Improved docs in terms of what 'max chars' means in the method
00119  * parameters.
00120  *
00121  * Revision 1.12  2000/04/06 19:42:51  rahulj
00122  * Clarified how big the target buffer should be in the API
00123  * documentation.
00124  *
00125  * Revision 1.11  2000/03/23 01:02:38  roddey
00126  * Updates to the XMLURL class to correct a lot of parsing problems
00127  * and to add support for the port number. Updated the URL tests
00128  * to test some of this new stuff.
00129  *
00130  * Revision 1.10  2000/03/20 23:00:46  rahulj
00131  * Moved the inline definition of stringLen before the first
00132  * use. This satisfied the HP CC compiler.
00133  *
00134  * Revision 1.9  2000/03/02 19:54:49  roddey
00135  * This checkin includes many changes done while waiting for the
00136  * 1.1.0 code to be finished. I can't list them all here, but a list is
00137  * available elsewhere.
00138  *
00139  * Revision 1.8  2000/02/24 20:05:26  abagchi
00140  * Swat for removing Log from API docs
00141  *
00142  * Revision 1.7  2000/02/16 18:51:52  roddey
00143  * Fixed some facts in the docs and reformatted the docs to stay within
00144  * a reasonable line width.
00145  *
00146  * Revision 1.6  2000/02/16 17:07:07  abagchi
00147  * Added API docs
00148  *
00149  * Revision 1.5  2000/02/06 07:48:06  rahulj
00150  * Year 2K copyright swat.
00151  *
00152  * Revision 1.4  2000/01/12 00:16:23  roddey
00153  * Changes to deal with multiply nested, relative pathed, entities and to deal
00154  * with the new URL class changes.
00155  *
00156  * Revision 1.3  1999/12/18 00:18:10  roddey
00157  * More changes to support the new, completely orthagonal support for
00158  * intrinsic encodings.
00159  *
00160  * Revision 1.2  1999/12/15 19:41:28  roddey
00161  * Support for the new transcoder system, where even intrinsic encodings are
00162  * done via the same transcoder abstraction as external ones.
00163  *
00164  * Revision 1.1.1.1  1999/11/09 01:05:52  twl
00165  * Initial checkin
00166  *
00167  * Revision 1.2  1999/11/08 20:45:21  rahul
00168  * Swat for adding in Product name and CVS comment log variable.
00169  *
00170  */
00171 
00172 #if !defined(XMLSTRING_HPP)
00173 #define XMLSTRING_HPP
00174 
00175 #include <util/XercesDefs.hpp>
00176 #include <util/RefVectorOf.hpp>
00177 
00178 class XMLLCPTranscoder;
00179 
00191 class  XMLString
00192 {
00193 public:
00194     /* Static methods for native character mode string manipulation */
00197 
00208     static void binToText
00209     (
00210         const   unsigned int    toFormat
00211         ,       char* const     toFill
00212         , const unsigned int    maxChars
00213         , const unsigned int    radix
00214     );
00215 
00226     static void binToText
00227     (
00228         const   unsigned int    toFormat
00229         ,       XMLCh* const    toFill
00230         , const unsigned int    maxChars
00231         , const unsigned int    radix
00232     );
00233 
00244     static void binToText
00245     (
00246         const   unsigned long   toFormat
00247         ,       char* const     toFill
00248         , const unsigned int    maxChars
00249         , const unsigned int    radix
00250     );
00251 
00262     static void binToText
00263     (
00264         const   unsigned long   toFormat
00265         ,       XMLCh* const    toFill
00266         , const unsigned int    maxChars
00267         , const unsigned int    radix
00268     );
00269 
00280     static void binToText
00281     (
00282         const   long            toFormat
00283         ,       char* const     toFill
00284         , const unsigned int    maxChars
00285         , const unsigned int    radix
00286     );
00287 
00298     static void binToText
00299     (
00300         const   long            toFormat
00301         ,       XMLCh* const    toFill
00302         , const unsigned int    maxChars
00303         , const unsigned int    radix
00304     );
00305 
00316     static void binToText
00317     (
00318         const   int             toFormat
00319         ,       char* const     toFill
00320         , const unsigned int    maxChars
00321         , const unsigned int    radix
00322     );
00323 
00334     static void binToText
00335     (
00336         const   int             toFormat
00337         ,       XMLCh* const    toFill
00338         , const unsigned int    maxChars
00339         , const unsigned int    radix
00340     );
00341 
00352     static bool textToBin
00353     (
00354         const   XMLCh* const    toConvert
00355         ,       unsigned int&   toFill
00356     );
00357 
00370     static int parseInt
00371     (
00372         const   XMLCh* const    toConvert
00373     );
00374 
00376 
00379 
00393     static void catString
00394     (
00395                 char* const     target
00396         , const char* const     src
00397     );
00398 
00411     static void catString
00412     (
00413                 XMLCh* const    target
00414         , const XMLCh* const    src
00415     );
00417 
00420 
00431     static int compareIString
00432     (
00433         const   char* const     str1
00434         , const char* const     str2
00435     );
00436 
00447     static int compareIString
00448     (
00449         const   XMLCh* const    str1
00450         , const XMLCh* const    str2
00451     );
00452 
00453 
00467     static int compareNString
00468     (
00469         const   char* const     str1
00470         , const char* const     str2
00471         , const unsigned int    count
00472     );
00473 
00487     static int compareNString
00488     (
00489         const   XMLCh* const    str1
00490         , const XMLCh* const    str2
00491         , const unsigned int    count
00492     );
00493 
00494 
00508     static int compareNIString
00509     (
00510         const   char* const     str1
00511         , const char* const     str2
00512         , const unsigned int    count
00513     );
00514 
00529     static int compareNIString
00530     (
00531         const   XMLCh* const    str1
00532         , const XMLCh* const    str2
00533         , const unsigned int    count
00534     );
00535 
00548     static int compareString
00549     (
00550         const   char* const     str1
00551         , const char* const     str2
00552     );
00553 
00565     static int compareString
00566     (
00567         const   XMLCh* const    str1
00568         , const XMLCh* const    str2
00569     );
00570 
00597     static bool regionMatches
00598     (
00599         const   XMLCh* const    str1
00600         , const int             offset1
00601         , const XMLCh* const    str2
00602         , const int             offset2
00603         , const unsigned int    charCount
00604     );
00605 
00633     static bool regionIMatches
00634     (
00635         const   XMLCh* const    str1
00636         , const int             offset1
00637         , const XMLCh* const    str2
00638         , const int             offset2
00639         , const unsigned int    charCount
00640     );
00642 
00645 
00655     static void copyString
00656     (
00657                 char* const     target
00658         , const char* const     src
00659     );
00660 
00671     static void copyString
00672     (
00673                 XMLCh* const    target
00674         , const XMLCh* const    src
00675     );
00676 
00689     static bool copyNString
00690     (
00691                 XMLCh* const    target
00692         , const XMLCh* const    src
00693         , const unsigned int    maxChars
00694     );
00696 
00699 
00705     static unsigned int hash
00706     (
00707         const   char* const     tohash
00708         , const unsigned int    hashModulus
00709     );
00710 
00717     static unsigned int hash
00718     (
00719         const   XMLCh* const    toHash
00720         , const unsigned int    hashModulus
00721     );
00722 
00732     static unsigned int hashN
00733     (
00734         const   XMLCh* const    toHash
00735         , const unsigned int    numChars
00736         , const unsigned int    hashModulus
00737     );
00738 
00740 
00743 
00751     static int indexOf(const char* const toSearch, const char ch);
00752 
00761     static int indexOf(const XMLCh* const toSearch, const XMLCh ch);
00762 
00773     static int indexOf
00774     (
00775         const   char* const     toSearch
00776         , const char            chToFind
00777         , const unsigned int    fromIndex
00778     );
00779 
00790     static int indexOf
00791     (
00792         const   XMLCh* const    toSearch
00793         , const XMLCh           chToFind
00794         , const unsigned int    fromIndex
00795     );
00796 
00805     static int lastIndexOf(const char* const toSearch, const char ch);
00806 
00815     static int lastIndexOf(const XMLCh* const toSearch, const XMLCh ch);
00816 
00827     static int lastIndexOf
00828     (
00829         const   char* const     toSearch
00830         , const char            chToFind
00831         , const unsigned int    fromIndex
00832     );
00833 
00844     static int lastIndexOf
00845     (
00846         const   XMLCh* const    toSearch
00847         , const XMLCh           ch
00848         , const unsigned int    fromIndex
00849     );
00851 
00854 
00859     static void moveChars
00860     (
00861                 XMLCh* const    targetStr
00862         , const XMLCh* const    srcStr
00863         , const unsigned int    count
00864     );
00865 
00867 
00870 
00878     static void subString
00879     (
00880                 char* const    targetStr
00881         , const char* const    srcStr
00882         , const int            startIndex
00883         , const int            endIndex
00884     );
00885 
00894     static void subString
00895     (
00896                 XMLCh* const    targetStr
00897         , const XMLCh* const    srcStr
00898         , const int             startIndex
00899         , const int             endIndex
00900     );
00901 
00903 
00906 
00910     static char* replicate(const char* const toRep);
00911 
00916     static XMLCh* replicate(const XMLCh* const toRep);
00917 
00919 
00922 
00928     static bool startsWith
00929     (
00930         const   char* const     toTest
00931         , const char* const     prefix
00932     );
00933 
00940     static bool startsWith
00941     (
00942         const   XMLCh* const    toTest
00943         , const XMLCh* const    prefix
00944     );
00945 
00954     static bool startsWithI
00955     (
00956         const   char* const     toTest
00957         , const char* const     prefix
00958     );
00959 
00969     static bool startsWithI
00970     (
00971         const   XMLCh* const    toTest
00972         , const XMLCh* const    prefix
00973     );
00974 
00981     static bool endsWith
00982     (
00983         const   XMLCh* const    toTest
00984         , const XMLCh* const    prefix
00985     );
00986 
00987 
00994     static const XMLCh* findAny
00995     (
00996         const   XMLCh* const    toSearch
00997         , const XMLCh* const    searchList
00998     );
00999 
01006     static XMLCh* findAny
01007     (
01008                 XMLCh* const    toSearch
01009         , const XMLCh* const    searchList
01010     );
01011 
01016     static unsigned int stringLen(const char* const src);
01017 
01022     static unsigned int stringLen(const XMLCh* const src);
01023 
01029     static bool isValidNCName(const XMLCh* const name);
01030 
01036     static bool isValidEncName(const XMLCh* const name);  
01037 
01043     static bool isAlpha(XMLCh const theChar);
01044 
01050     static bool isDigit(XMLCh const theChar);
01051 
01053 
01056 
01062     static void cut
01063     (
01064                 XMLCh* const    toCutFrom
01065         , const unsigned int    count
01066     );
01067 
01076     static char* transcode
01077     (
01078         const   XMLCh* const    toTranscode
01079     );
01080 
01097     static bool transcode
01098     (
01099         const   XMLCh* const    toTranscode
01100         ,       char* const     toFill
01101         , const unsigned int    maxChars
01102     );
01103 
01112     static XMLCh* transcode
01113     (
01114         const   char* const     toTranscode
01115     );
01116 
01128     static bool transcode
01129     (
01130         const   char* const     toTranscode
01131         ,       XMLCh* const    toFill
01132         , const unsigned int    maxChars
01133     );
01134 
01140     static void trim(char* const toTrim);
01141 
01147     static void trim(XMLCh* const toTrim);
01148 
01155     static RefVectorOf<XMLCh>* tokenizeString(const XMLCh* const tokenizeSrc);
01156 
01162     static bool isInList(const XMLCh* const toFind, const XMLCh* const enumList);
01163 
01165 
01168 
01176     static XMLCh* makeUName
01177     (
01178         const   XMLCh* const    pszURI
01179         , const XMLCh* const    pszName
01180     );
01181 
01197     static unsigned int replaceTokens
01198     (
01199                 XMLCh* const    errText
01200         , const unsigned int    maxChars
01201         , const XMLCh* const    text1
01202         , const XMLCh* const    text2
01203         , const XMLCh* const    text3
01204         , const XMLCh* const    text4
01205     );
01206 
01211     static void upperCase(XMLCh* const toUpperCase);
01212 
01217     static void lowerCase(XMLCh* const toLowerCase);
01218 
01222     static bool isWSReplaced(const XMLCh* const toCheck);
01223 
01227     static bool isWSCollapsed(const XMLCh* const toCheck);
01228 
01233     static void replaceWS(XMLCh* const toConvert);
01234        
01239     static void collapseWS(XMLCh* const toConvert);
01241 
01242 
01243 private :
01244 
01247 
01248     XMLString();
01250     ~XMLString();
01252 
01253 
01256 
01257     static void initString(XMLLCPTranscoder* const defToUse);
01258     static void termString();
01260 
01265     static bool validateRegion(const XMLCh* const str1, const int offset1,
01266                         const XMLCh* const str2, const int offset2,
01267                         const unsigned int charsCount);
01268 
01269     friend class XMLPlatformUtils;
01270 };
01271 
01272 
01273 // ---------------------------------------------------------------------------
01274 //  Inline some methods that are either just passthroughs to other string
01275 //  methods, or which are key for performance.
01276 // ---------------------------------------------------------------------------
01277 inline void XMLString::moveChars(       XMLCh* const    targetStr
01278                                 , const XMLCh* const    srcStr
01279                                 , const unsigned int    count)
01280 {
01281     XMLCh* outPtr = targetStr;
01282     const XMLCh* inPtr = srcStr;
01283     for (unsigned int index = 0; index < count; index++)
01284         *outPtr++ = *inPtr++;
01285 }
01286 
01287 inline unsigned int XMLString::stringLen(const XMLCh* const src)
01288 {
01289     if (src == 0 || *src == 0)
01290     {
01291         return 0;
01292    }
01293     else
01294    {
01295         const XMLCh* pszTmp = src + 1;
01296 
01297         while (*pszTmp)
01298             ++pszTmp;
01299 
01300         return (unsigned int)(pszTmp - src);
01301     }
01302 }
01303 
01304 inline bool XMLString::startsWith(  const   XMLCh* const    toTest
01305                                     , const XMLCh* const    prefix)
01306 {
01307     return (compareNString(toTest, prefix, stringLen(prefix)) == 0);
01308 }
01309 
01310 inline bool XMLString::startsWithI( const   XMLCh* const    toTest
01311                                     , const XMLCh* const    prefix)
01312 {
01313     return (compareNIString(toTest, prefix, stringLen(prefix)) == 0);
01314 }
01315 
01316 inline bool XMLString::endsWith(const XMLCh* const toTest,
01317                                 const XMLCh* const suffix)
01318 {
01319 
01320     unsigned int suffixLen = XMLString::stringLen(suffix);
01321 
01322     return regionMatches(toTest, XMLString::stringLen(toTest) - suffixLen,
01323                          suffix, 0, suffixLen);
01324 }
01325 
01326 inline XMLCh* XMLString::replicate(const XMLCh* const toRep)
01327 {
01328     // If a null string, return a null string!
01329     XMLCh* ret = 0;
01330     if (toRep)
01331     {
01332         const unsigned int len = stringLen(toRep);
01333         ret = new XMLCh[len + 1];
01334         XMLCh* outPtr = ret;
01335         const XMLCh* inPtr = toRep;
01336         for (unsigned int index = 0; index <= len; index++)
01337             *outPtr++ = *inPtr++;
01338     }
01339     return ret;
01340 }
01341 
01342 inline bool XMLString::validateRegion(const XMLCh* const str1,
01343                                       const int offset1,
01344                                       const XMLCh* const str2,
01345                                       const int offset2,
01346                                       const unsigned int charsCount)
01347 {
01348 
01349     if (offset1 < 0 || offset2 < 0 ||
01350         (offset1 + charsCount) > XMLString::stringLen(str1) ||
01351         (offset2 + charsCount) > XMLString::stringLen(str2) )
01352         return false;
01353 
01354     return true;
01355 }
01356 
01357 #endif


Copyright © 2000 The Apache Software Foundation. All Rights Reserved.