IBM Corporation October 2003 ___________________________________________________________ IBM DB2 Intelligent Miner Scoring Version 8.1 PTF IP22690 Readme ___________________________________________________________ CONTENTS ________ This file includes the following information: 1.0 ABOUT THIS README FILE 2.0 CONTENTS OF THIS PTF 2.1 PMML 2.1 support 2.2 Support of PMML models generated by SAS EM 2.3 Model Caching 2.4 Java Single Record Scorer 2.5 Problems fixed by this PTF 3.0 INSTALLATION AND SETUP 3.1 Prerequisites 3.2 Installing IM Modeling PTF IP22690 on AIX 3.3 Installing IM Modeling PTF IP22690 on Linux 3.4 Installing IM Modeling PTF IP22690 on Solaris 3.5 Installing IM Modeling PTF IP22690 on Windows 3.6 Database Manager Instance Migration 3.7 Database Migration 4.0 DEINSTALLATION OF THIS PTF 5.0 KNOWN ISSUES 5.1 Visualizing Neural Clustering Models 5.2 Error SQL0901N with reason "Bad Compiler return code!" when building a task 5.3 DB2 DUMP file written after stopping DB2 6.0 NOTICES 6.1 Trademarks and service marks 1.0 ABOUT THIS README FILE Welcome to the IBM DB2 Intelligent Miner Scoring Version 8.1 PTF IP22690 for AIX, Linux, Solaris and Windows. To install this PTF, see below the section INSTALLATION AND SETUP. This PTF is referred to as IP22690. On Windows, Solaris, and Linux it consists of this single PTF, only. On AIX this PTF technically is divided up into the following PTFs: IP22690, IP22691, IP22693, IP22694, IP22695, IP22696, IP22697, IP22700. So, the term IP22690 refers to one PTF on Windows, Solaris, and Linux, but to several PTFs on AIX. After installing this PTF you will have V8.1.0.1 of IM Scoring on your system. You must migrate all databases enabled for IM Scoring after installing this PTF. For migration steps see section 3.6 Database Manager Instance Migration and section 3.7 Database Migration 2.0 CONTENTS OF THIS PTF 2.1 PMML 2.1 support In addition to the tags defined in core PMML 2.1, the following features are supported: * new elements 'Target' and 'Output', * new elements 'DefineFunction', 'Apply' and a number of pre-defined functions * 'MapValues' supports attributes 'mapMissingTo' and 'defaultValue' * support for additional datatypes 'date', 'time', 'dateTime', 'dateDaysSince[]', 'timeSeconds' and 'dateTimeSecondsSince[]' in 'DataDictionary' * Regression: Additional normalization methods logit, probit. Regression table supports 'PredictorTerm' elements. * Association rule models support an additional attribute 'lift' * Clustering models support additional element 'MissingValueWeights' * Neural supports additional activation functions 'cos' and 'radialBasis', neurons can have attribute 'altitude' 2.2 Support for SAS Models Models generated by SAS V9.1 Enterprise Miner can now be used with IM Scoring. The following algorithms are supported: Tree, regression, clustering and neural networks. Models can be used like any other model, no special options are needed. Note that the following SAS options are not supported: * Decision processing including estimated profit and loss values. * Neural networks with radial combination function and non-exponential activation function. * Neural networks with multiple targets. * Additional transformations and models not defined by PMML. Furthermore, the probabilities in Tree Classification models with user-set prior probabilities may be inprecise. For further details, please refer to your SAS V9.1 Enterprise Miner documentation. 2.3 Model Caching This PTF introduces a memory cache for mining models that significantly improves the performance of scoring runs and allows realtime scoring. Realtime Scoring means to have a very fast response time for a scoring run of a single input record with an execution time that is typically not bigger than a few hundred milliseconds. The model caching facility realizes realtime scoring, which is possible when the user has performed at least one (not realtime) scoring run with the same model before, and when this model has been inserted into (and not yet removed from) the cache. The model caching facility allows to keep models in main memory for subsequent scoring runs. Without a model cache, a model is deleted from main memory when the scoring run (typically done in a SELECT statement) has finished. When a model is permanently stored in memory, scoring runs using this model perform faster because the initial loading and parsing phase can be skipped. There are some SQL functions to inspect and control the model cache, but basically model caching is transparent to the user, i.e. a model used in a SELECT statement is automatically inserted into the cache, and older models are removed from the cache. There are also new apply UDFs that identify the model by its name and do not require to pass the complete model as large object (LOB) to the function. The model is searched in the cache by its name or, if not present, is loaded from one of the sample tables that can be created when enabling the database. These UDFs additionally speed up record scoring in two orders of magnitude. So it is recommended to use these new apply functions. The model cache is only available for DB2 V8. Do not use the new cache functions described here in DB2 V7. Furthermore, you must enable your database in fenced mode, the unfenced mode does not support model caching. The cache is switched off by default, i.e. the default size of the model cache is set to 0 models. There is one model cache per database instance. The cache is located in the db2fmp process of DB2 V8. In case this process crashes for any reason, DB2 creates a new db2fmp process. In such a situation the models in the cache would be lost and the new process starts with a new empty cache. New UDFs and configuration parameters for model caching a) Setting model cache size The following call allows the cache to store three models: CALL DM_setEnv( 'IDM_MX_MAX_OPEN_MODELS', '3'); The cache is switched off by: CALL DM_setEnv( 'IDM_MX_MAX_OPEN_MODELS', '0'); Cached models can be of two types: 'pinned' and 'unpinned'. The variable IDM_MX_MAX_OPEN_MODELS controls the number of unpinned models in the cache only. Models implicitly loaded into the cache via an apply UDF are always unpinned, which means they can be removed from the cache at any time if a new model needs cache space. Pinned models must be explicitly inserted into and removed from the cache, see function DM_openModel below. b) Loading pinned models into the cache DM_openModel(model, model alias) model: The model (Type: DM_CLUSTERINGMODEL, DM_CLASMODEL, or DM_REGRESSIONMODEL) model alias: A unique (per database instance) user-specified name for the model (Type: VARCHAR(256)). If NULL is passed, the model does not get an alias. The UDF returns a value of type BIGINT, which is a unique ID for the model. The ID is used to close the model via DM_closeModel. The UDF inserts the model into the cache independent of the cache size specified by IDM_MX_MAX_OPEN_MODELS. The model is pinned, i.e. the model can only be removed from the cache explicitly via DM_closeModel or DM_closeAllModels. It is allowed to load the same model more than one time into the cache, however, the alias of these models must be the same. This can improve scoring performance on a parallel system if several database users score records using the same model, because the scoring requests are equally distributed between the various instances of the model. The model alias is used to identify the model in the new apply functions (see below). Example: SELECT IDMMX.DM_openModel(MODEL, MODELNAME) FROM IDMMX.CLUSTERMODELS WHERE MODELNAME = 'DemoBanking'; c) Explicitly removing a model from the cache DM_closeModel(cachedModelID) cachedModelID: ID of the model as it is returned by DM_openModel (Type: BIGINT) The UDF returns a value of type INTEGER; the value is 1 if a model with the passed ID was found in and removed from the cache, 0 else. Example: VALUES (DM_closeModel( )); d) Removing all models from the cache DM_closeAllModels(removePinned) removePinned: When removePinned is set to 1, unpinned and pinned models are removed, otherwise only the unpinned models are removed. (Type: INTEGER) Example: CALL DM_closeAllModels(1); e) New scoring apply UDFs IDMMX.DM_applyClusModel(model name, data record) IDMMX.DM_applyClasModel(model name, data record) IDMMX.DM_applyRegModel(model name, data record) model name: The model name is used as model alias to find the model in the cache. Use the function DM_openModel to load a pinned model into the cache and assign a model alias to this model. If no model was found in the cache, the function tries to locate the model in the corresponding default model table (IDMMX.CLUSTERMODELS, IDMMX_CLASSIFMODELS, or IDMMX.REGRESSIONMODELS), and, if successful, reads the model as unpinned model into the cache, using model name as model alias for the new cached model (Type: VARCHAR(256)). data record: Input data record in the format returned by REC2XML(1.0, 'COLATTVAL', '', ...) (Type: VARCHAR(32672)) The UDFs apply a model specified by a model name to a data record. Parameters are passed as VARCHAR (not as LOBs) to improve performance. The UDFs return values of type DM_ClusResult, DM_ClasResult, or DM_RegResult. These UDFs work also in unfenced mode. The model is directly selected from the corresponding default model table and, if successful, applied to the data record. These functions are only available if the database was enabled with parameter "tables". Example: WITH SCORER_INPUT(TYPE, AGE, SIBLINGS, INCOME) AS ( VALUES( 'red ', 27, 1, 6611 ) ), SCORER_RESULT(RESULT) AS ( SELECT IDMMX.DM_applyClusModel('DemoBanking', rec2xml( 1.0, 'COLATTVAL', '', TYPE, AGE, SIBLINGS, INCOME) AS RESULT FROM SCORER_INPUT ) SELECT IDMMX.DM_getClusterID(RESULT) AS CLUSTER_ID, FROM SCORER_RESULT; f) Inspecting the model cache DM_getOpenModels This table function returns the contents of the model cache as a table. For every model in the cache, one row is returned. The returned table has the following columns: MODELID Type: BIGINT: Contains the ID of the model in the cache. MODELNAME Type: VARCHAR(240): Contains the name of a model as it is found in the PMML representation ALIAS Type: VARCHAR(256): Contains the alias for a model, if specified in DM_openModel, an empty string else USER Type: VARCHAR(128): Application Authorization ID TIMESTMP Type: TIMESTAMP: Last access time of the model PINNED Type: INTEGER: Indicates if the model is pinned into the cache (1), or if the model is unpinned (0) SIZE Type: INTEGER: Size of the model in bytes. The value is the size of the PMML representation of the model. USECOUNT Type: INTEGER: Number of scoring runs that currently use the model Example: SELECT CT.* FROM TABLE(IDMMX.DM_getOpenModels()) CT 2.4 Java Single Record Scorer The Java Single Record Scorer can now read mining models not just from files, but also from a database table. To this end, the Java class "RecordScorer" has been extended by a new constructor and a new method (openModel) which allow to read models from the database via JDBC. The model must exist in a table that has at least two columns, one column of type VARCHAR that identifies the model, and one column of type DM_CLUSTERINGMODEL, DM_CLASMODEL, or DM_REGRESSIONMODEL, that contains the model itself. For the API description see the javadoc documentation. 2.5 Problems fixed by this PTF Confidences for neural classification models from producers other than Intelligent Miner for Data are now the activation of the respective output neuron. Neural classification and prediction models will not give results anymore in case all input values are missing 3.0 INSTALLATION AND SETUP 3.1 Prerequisites On AIX, Linux, and Solaris, you must have successfully installed the IBM DB2 Intelligent Miner Scoring Version 8.1. On Windows, the IP22690.exe installs a try & buy version of the product if IBM DB2 Intelligent Miner Scoring Version 8.1 has not been installed previously. 3.2 Installing IBM DB2 Intelligent Miner Scoring Version 8.1 PTF IP22690 on AIX. As stated above, the PTF for AIX is divided up into the following PTFs: IP22690, IP22691, IP22693, IP22694, IP22695, IP22696, IP22697, IP22700. To install the PTFs on AIX, follow these steps: - Download all files into a directory - Log on as user root - Type smit or smitty on the AIX command line - Select the following options: a. Software Installation and Maintenance b. Install and Update Software c. Update Software by Fix (APAR) d. Enter input device/path containing update packages e. List FIXES to install f. Install this fix (IC36789) Note: Several sysck warnings (3001-036) occur during installation of fileset IMinerX.common.db2 8.1.0.3. These warnings can be ignored. (The term 8.1.0.3 here is about the filesets vresion, not about the product version, which is 8.1.0.1.) If you have installed IBM DB2 Intelligent Miner Scoring on AIX 5.2, you should install fileset bos.rte.libc 5.2.0.14 (or later). If this fix is not installed on your AIX 5.2 system, you may experience error "E 2001: unexpected token: ; expected token: ..." when importing a model created by DB2 Intelligent Miner for Data. If you install IBM DB2 Intelligent Miner Modeling Version 8.1 after this PTF has been applied, you must not install the filesets "IMinerX.common.db2" and "IMinerX.base" of IM Modeling Version 8.1. This is because newer versions of both filesets are contained in this Scoring PTF. 3.3 Installing IBM DB2 Intelligent Miner Scoring Version 8.1 PTF IP22690 on Linux To install the PTF on Linux, follow these steps: The PTF is composed of 8 files beginning with 'IMinerX' and marked by the version 8.1.0.1, as well as 4 installation scripts beginning with 'linuxInstall' and 4 deinstallation scripts beginning with 'linuxUninstall'. In the directory containing the PTF, issue the following command: ./linuxInstallSc to update IM Scoring for DB2 ./linuxInstallScO to update IM Scoring for Oracle ./linuxInstallCnv to update IM Scoring conversion utilities ./linuxInstallScoringBean to update IM Scoring Java Record Scorer Note that you will not be able to install IBM DB2 Intelligent Miner Modeling Version 8.1 after this PTF has been applied. You have to install this PTF afterwards! 3.4 Installing IBM DB2 Intelligent Miner Scoring Version 8.1 PTF IP22690 on Solaris To install the PTF on Solaris, follow these steps: - Download all files into a directory. - Extract the files from IP22690.tar.gz: gunzip IP22690.tar.gz tar xvf IP22690.tar - Log on as 'root' and change to the directory containing the files extracted in the previous step. - Issue the following command: ./sunInstallPTF Note that if you install IBM DB2 Intelligent Miner Modeling Version 8.1 after this PTF has been applied, you have to install this PTF once again! 3.5 Installing IBM DB2 Intelligent Miner Scoring Version 8.1 PTF IP22690 on Windows To install the PTF on Windows, follow these steps: - Stop DB2. - Execute the file IP22690.exe. - In the initial dialog, choose the language for your installation. - Follow the instructions that are displayed. In the Custom Setup Dialog, you can choose between several features. Features that you select will be installed; features that you do not select will be deinstalled if you had selected them in the original IBM DB2 Intelligent Miner Scoring Version 8.1 installation. - Reboot your system. 3.6 Database Manager Instance Migration After installation of this PTF you must migrate your instance configuration. This step is necessary on AIX, Linux and Solaris platforms. This step is not necessary on Windows platforms. A DB2 instance must have access to the libraries containing the IM Modeling/Scoring functions in order to use IM from that instance. Some library names and installation paths are changed in this PTF. To migrate the DB2 instances to the new version of IM you must rerun the script "idminstfunc". This command is described in the Scoring manual in chapter "Configuring the database management system on UNIX platforms" "Enabling the DB2 instance on UNIX platforms". 3.7 Database Migration After installation of this PTF you must migrate all databases enabled for IM Scoring. This step is necessary on all platforms. IM Scoring works with database objects like user defined types (UDT), functions (UDF), methods (UDM) and stored procedures (SP). These objects must be created in each DB2 database to be used with IM. This step is called "enabling the database". New database objects have been added in this PTF and some database objects need to be updated. To update your database enabling rerun the "idmenabledb" command on all enabled databases. To get a list of databases for your database instance use the DB2 command: db2 list db directory All local databases already enabled with a previous version of IM Scoring should be migrated. The enablement status of existing databases can be checked with the idmcheckdb command: idmcheckdb Sample output: E:\>idmcheckdb sample The database "sample" is enabled for IM Modeling Version 8.1 and IM Scoring Version 8.1 in "fenced" mode. To use this database with the current version 8.1.0.1 (i.e. after the installation of this PTF) you must migrate the database. Use command idmenabledb to migrate your database. The databases are migrated by calling idmenabledb again. You do not need to disable (idmdisabledb) a database before invoking idmenabledb. idmenabledb [fenced|unfenced] [tables|notables] [dbcfg|nodbcfg] [ClasModelSize ] [RuleModelSize ] [ClusModelSize ] [RegModelSize ] [StructSize ] [ApplDataSize ] [ResultSize ] The fenced|unfenced option should be the same as during the first enabling of your database. If you don't know the mode, then use idmcheckdb first. The tables|notables option lets you create the sample tables if you haven't done it originally during the first enabling of your database. If neither "tables" nor "notables" is specified the user is prompted whether to create the sample tables 4.0 DEINSTALLATION OF THIS PTF To deinstall the PTF on Windows, follow these steps: After installation of this PTF, the Add/Remove Programs Window displays the following Product: IBM DB2 Intelligent Miner Scoring 8.1.0.1 This PTF upgrades IM Scoring 8.1 to IM Scoring 8.1.0.1. Therefore you cannot deinstall the PTF only, you must deinstall the complete product IM Scoring 8.1.0.1, and then reinstall the original version IBM DB2 Intelligent Miner Scoring 8.1. If IM Modeling and IM Scoring are installed and you want to deinstall IM Modeling 8.1 as well as IM Scoring 8.1.0.1, it is recommended to deinstall IM Modeling 8.1 (and possible IM Modeling PTFs) first and then IM Scoring 8.1.0.1. If you chose another deinstallation sequence, some files cannot be removed automatically by the deinstallation process. 5.0 KNOWN ISSUES 5.1 Visualizing Neural Clustering Models After installing the PTF IP22690, the Neural Clustering models created with IBM Intelligent Miner for Data cannot be visualized with IBM DB2 Intelligent Miner Visualization. This happens in the following cases: 1. You exported a Neural Clustering model from IBM Intelligent Miner for Data using the Intelligent Miner Format. You convert this model file to the PMML 2.1 format using the conversion utilities of IBM DB2 Intelligent Miner Scoring and try to visualize the resulting PMML file with IBM DB2 Intelligent Miner Visualization. 2. You exported a Neural Clustering model from IBM Intelligent Miner for Data using the Intelligent Miner Format. You import this model into DB2 using IBM DB2 Intelligent Miner Scoring and try to visualize the resulting model object of DB2 with IBM DB2 Intelligent Miner Visualization. 3. You create a new Neural Clustering model or open an old model in IBM Intelligent Miner for Data client version 8.1 installed on AIX. In this case, it is recommended not to install the PTF on this machine. Please check the IBM DB2 Intelligent Miner Visualization support web page for latest PTFs or contact your IBM support contact. 5.2 Error SQL0901N with reason "Bad Compiler return code!" when building a task When calling several methods in the same SQL statement, you sometimes receive a SQL error SQL0901N like in the following example: > db2 "values (IDMMX.DM_LogicalDataSpec() ..DM_addDataSpecFld('field1') ..DM_addDataSpecFld('field2') ..DM_addDataSpecFld('field3') ..DM_addDataSpecFld('field4') )" SQL0901N The SQL statement failed because of a non-severe system error. Subsequent SQL statements can be processed. (Reason "Bad Compiler return code!".) SQLSTATE=58004 This is a known bug of DB2 v7 that is fixed in fixpack 10. Please upgrade your DB2 v7 to the latest fixpack, at least fixpack 10. The error has not been observed on DB2 v8 yet. 5.3 DB2 DUMP file written after stopping DB2 After using IM Scoring on AIX with DB2 V7.2 FP10 and terminating the DB2 process in which IM Scoring has been run, DB2 may write a dump file in the db2dump directory of the DB2 instance. As the DB2 client is not informed about it, pls. check your db2dump directory. 6.0 Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to the IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: IBM World Trade Asia Corporation Licensing 2-31 Roppongi 3-chome, Minato-ku Tokyo 106, Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION 'AS IS' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Deutschland Informationssysteme Department 3982 Pascalstrasse 100 70569 Stuttgart Germany Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurement may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. 6.1 Trademarks The following terms are trademarks of the IBM Corporation in the United States, other countries, or both: AIX AFS Database 2 DataJoiner DB2 DB2 Universal Database IBM Intelligent Miner iSeries MVS MVS/ESA OpenEdition OS/390 OS/400 pSeries @.server RACF RS/6000 SP System/390 Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark in the United States, other countries, or both and is licensed exclusively through X/Open Company Limited. Other company, product, and service names may be trademarks or service marks of others. (C) Copyright IBM Corporation 1999, 2003. All rights reserved.