IBM InfoSphere Global Name Recognition Version 4.2.0 Fix Pack 2 - Release Notes These release notes contain information about IBM InfoSphere Global Name Recognition, Version 4.2 Fix Pack 2, including installation notes, known issues, fixed problems, and usage notes. For the latest version of the release notes, see the product support site at ibm.com. http://www.ibm.com/support/entry/portal/Overview/Software/Information_Management/InfoSphere_Global_Name_Recognition Contents - System requirements - Performance considerations - Installing Version 4.2 Fix Pack 2 - List of fixes and corrections - Product documentation - Known issues and changes to using the product - Announcements =============================================================================================================== System requirements For the latest information about hardware and software compatibility, see the detailed system requirements document at http://www-01.ibm.com/support/docview.wss?&uid=swg27019150. =============================================================================================================== Performance considerations Searches should not exceed five tokens when running a search against a large data list (5 million or more names) on a computer that runs Linux for S/390. Including more than five tokens in your search query can return results that exceed the limit size, causing a transaction timeout that leads web services to fail. See the product information center for more performance information. =============================================================================================================== Installing Version 4.2 Fix Pack 2 You must complete the following steps to run the installation program to install IBM® InfoSphere™ Global Name Recognition Version 4.2 Fix Pack 2. Before you begin ---------------- You must install IBM InfoSphere Global Name Recognition Version 4.2 before installing Fix Pack 2. To install the fix pack, run the installer from the product media, or copy the product installer package including the executable to a local drive. The product installer cannot be run from a network drive. If you are upgrading, ensure that your existing product version is supported. Note: If you are upgrading GNR v4.2.0.0.hf2 or 4.2.0.0.hf3 you must first grant permission to the installing user to write to the /data/kanjiData.ibm file. Configuration files: The installer now writes new configuration files to *.config.template and does not overwrite existing configuration files. Procedure --------- 1. Obtain the product media for the fix pack. 2. Run the installation program: GUI mode - Navigate to the /Disk1/InstData/VM/ directory on the product media for your platform. - Run the install executable. Command line mode - Open a command prompt or shell window. - Navigate to the /Disk1/InstData/VM/ directory on the product media for your platform. - Run the install executable with the -i console option: install -i console 3. Follow the instructions on the installation program wizard to install the fix pack. a) On the Introduction panel, review the information and ensure that your computers meet all system requirements. b) On the Destination panel, browse to the directory (fully qualified path) where you want to install the fix pack. This directory must be the directory that contains your existing IBM InfoSphere Global Name Recognition Version 4.2 installation. c) On the Pre-Installation Summary panel, review the summary. When you are finished, click Next to complete the installation. d) On the Install Complete panel, review the status, and then click Done to exit the installation. When installing on Solaris systems, you might see the "No such file or directory" message on the final window. You can safely ignore this warning message. =============================================================================================================== List of fixes and corrections ----------------------------- The following list describes the errors or problems corrected with Version 4.2 Fix Pack 2: Distributed Search - Name Preprocessor did not allow for customization of external files for parsing. - No error message appears when Name Preprocessor cannot write to a file where there is not enough space. Linguistic precision enhancements A number of improvements have been made to name regularization, variants, and qualifiers, including: - Improvement to matching numbers to ordinals (18 vs 18th) - Regularization rules file (genericOnRegRule file) has been updated to handle radio station frequency identifier information (for example Fiesta Radio 98.6 F.M.). - Tokens in organizational names are now correctly regularized. - Regularization rules changes were made for German. German spelling variations were added to the variant file. - Enhancements to titles, affixes, and qualifiers (TAQ) include additions for Malay names. - "M" is no longer a variant for Mohamed. NameHunter configuration - Added method for configurability of treating blank fields as "unknown" or "no name". - Improved algorithm for edit distance calculations Jaro-Winkler. - The left bias parameter for Indian given names is now set as FALSE by default. NameParser - Affix rules for phrase creation that were applied too strictly have been adjusted (for example, "Abdul Hussain" in India). - Various other improvements to Name Parser logic. NameSifter - As noted in the Version 4.2 documentation, NameSifter can generate unpredictable results when used with non-English organization names. This limitation includes English names with European business suffixes such as E.U. and S.A. Do not attempt to use NameSifter with name data that is not based on United States English. It is intended for use with personal and organizational names from the United States only. Name Preprocessor - Improved handling of conjoined names. - Name Preprocessor output and configuration have changed. See relevant sections in the information center or PDFs. Changes are summarized as follows: Name Preprocessor no longer generates .nh files. It splits the interim .npp file into multiple files. As a result, nhFile and numNhFiles are replaced by nppOutFile and numNppOutFiles. createNh and createNpp have been removed. Note: Distributed Search and NameWorks Embedded Search users should re-run Name Preprocessor to generate new data list files. NameWorks Embedded Search should users set createOrig =false in npp.config to run Name Preprocessor to generate output (used as data list files). Distributed Search users should continue to set createOrig to true or false depending on whether to perform unique or full search. Search - general - Improved search handling of compressed names. Transliteration - Transliteration of Kanji names is improved, including names containing spaces. - Inclusion of ICU transliteration Any to Latin for use as a fall back transliterator. Web Services - Fix so that Web Service API and GNM-Name Analyzer can be accessed simultaneously. =============================================================================================================== Product documentation You can find product documentation with fix pack updates for Version 4.2 in the following places: Version 4.2 information center Access at ibm.com®: http://publib.boulder.ibm.com/infocenter/gnrgnm/v4r2m0/index.jsp Access on your local server, installed as part of the product install: Open a web browser and enter the URL for your server: http://:/help/index.jsp The documentation port number that you specify during installation. WebSphere® Application server hostname or IP address. Culture reference for Name Analyzer Culture reference information is provided within the IBM InfoSphere Global Name Recognition Name Analyzer tool. IBM® product Support home Access at ibm.com: http://www.ibm.com/support/entry/portal/Overview/Software/Information_Management/InfoSphere_Global_Name_Recognition In addition to Technotes and other Support-related information, contains links to the information center, PDF versions of the product information, and the latest updates of the release notes. =============================================================================================================== Known issues and changes to using the product Known problems are documented in the form of individual technotes in the Support portal at http://www.ibm.com/support/entry/portal/Overview/Software/Information_Management/InfoSphere_Global_Name_Recognition - Under Search support, enter a keyword, phrase, error code, or APAR number to search on. - Select Within my selected products. - Click the search arrow icon. As problems are discovered and resolved, the IBM Support team updates the Support portal. By searching the Support portal, you can quickly find work-arounds or solutions to problems. At time of publication, there were no known installation problems. Check the Support portal for the most current information. Memory consumption notes for Embedded Search on AIX Memory consumption can be high on AIX machines when running Embedded Search. Possible reasons and corrective actions: - By design, freed memory is not returned to the system on AIX. Memory is returned to the process and used as cache. As a result, memory page size can increase. Disclaiming the memory space can force memory to return to the system, but this can impact performance. It is recommended for AIX to only selectively and programmatically disclaim large allocated spaces if needed. - Class object size tends to take more space on AIX than on other platforms. For example, a NameWorks SearchMatch object on 64 bit Linux is 72 bytes but 192 on AIX. - A 16-byte memory alignment is used on VMX-enabled machines such as Power 6 or Power 7. This alignment can cause large memory footprint for Embedded Search which constantly allocates and uses a lot of memory. - Bad data. Sudden memory size jumps can result from matching on empty query names. While NameHunter allows matching on empty names, the numbers of search results can be very large. - By design, NameHunter uses memory to gain speed. Some temporary storage used in search is kept in the memory and reused. Consider the following to improve performance: - Strip out leading and trailing spaces for each query name before passing it to NameHunter for search. - Avoid searching on empty query names. - Set the environment variable, LIBCPP_NOVMX=1 on a VMX enabled machine (Power 6 or Power7) to eliminate the 16-byte padding. Note: this is only recommended for Embedded Search and the amount of memory saved is typically small. =============================================================================================================== Announcements ------------- You can find the latest announcement letter, which is linked to from the following page at http://www.ibm.com/software/data/infosphere/global-name-recognition/. See the announcement for the following information: Detailed product description, including a description of new functions Product-positioning statement Packaging and ordering details International compatibility information