Fix (APAR): PK20304 Status: Fix Release: 6.0.2.9,6.0.2.8,6.0.2.7,6.0.2.6,6.0.2.5,6.0.2.4,6.0.2.3,6.0.2.2,6.0.2.1,6.0.2 Operating System: AIX,HP-UX,i5/OS,Linux,Linux pSeries,Linux Red Hat - pSeries,Linux zSeries,OS/390,OS/400,Solaris,Windows,z/OS Supersedes Fixes: CMVC Defect: 363613 Byte size of APAR: 44273 Date: 2006-07-25 Abstract: In order to make the version 6.0.2 code function more like version 5 with quick timeouts for clients when all the EJB servers are down, we had to create two custom properties for t Description/symptom of problem: PK20304 resolves the following problem: ERROR DESCRIPTION: WLM cluster data would not be gathered on a particular cluster until the moment the first request for that cluster came in. This cause a NO_IMPLEMENT to be seen on the client once (sometimes 2-3 times) until the cluster data was created and propagated. The NO_IMPLEMENT would not be seen after the data was propagated. LOCAL FIX: None PROBLEM SUMMARY USERS AFFECTED: WebSphere Application Server users of version 6.0.2 using Workload Management (WLM) and concerned about client timeout lengths or hitting an infinite loop in WLM selection. PROBLEM DESCRIPTION: In order to make the version 6.0.2 code function more like version 5 with quick timeouts for clients when all the EJB servers are down, we had to create two custom properties for the callback timeout and to enable the preload function. RECOMMENDATION: None Callback timeout was hardcoded to 3 minutes which could cause clients to timeout waiting on requests when all servers are down. This is a regression of behavior from versio n5 in which the clients would get an immeadiate response that there were no members available. The reason for this is that with the version 6 code, which allowed for asynchronous updates and selection, there was no manner for the WLM code to differentiate between a No Cluster Data Available exception because we are just starting up and we haven't yet initialized the data for that cluster, or there not being any data because all the servers are down. To combat the scenario in which we are just starting up, the callback timeout was added to allow a request to wait for that cluster data to be fluffed up and then the request sent through. However, this left the problem described above in that the wait would also occur when all the servers are down. There was an additional possibility that an infinite loop in selection could occur if the retry limit is reached. PROBLEM CONCLUSION: After creating a custom property to let the customer determine the length of the callback timeout (including the ability to skip it altogether) in order to get around the NO_IMPLEMENT on first touch of a cluster, we had to backport in the Pre-fetch logic from the next release (which is called PreLoad in 6.0.x to differentiate that the code is not exactly the same) which when enabled will cause a node agent to preLoad all of the cluster data without waiting for the first request to come in to do so. The combination of this custom property to enable the PreLoad logic and the custom property to set the amount of the callback timeout fixed the issue with clients timing out on requests that they wouldn't have in version 5 and still succeeding on first request of a cluster. An additional fix was added to solve a possible infinite loop in the selection logic if the retry limit was reached. A minor code of 40 instead of 42 is now thrown if we ever reach a scenario in which we run out of retry attempts. The WLM code will retry at a high level if the minor code is 42 on a NO_IMPLEMENT, but will not with a minor code of 40. In order to enable and use either of the custom properties, you must take the following steps: In the administrative console click on "System Administration" on the left side, click on "Cell" underneath that, then click on "Custom Properties" in the middle frame. You'll want to create two properties: IBM_CLUSTER_CALLBACK_TIMEOUT with a value for the timeout in milliseconds. 10000 is 10 seconds for example, hit apply, then back out to the custom property screen and go in again to create a new property (otherwise you'll overwrite the old one) IBM_CLUSTER_ENABLE_PRELOAD with a value of true to enable the preload function. This value must be true if the callback timeout is set to zero, and is not recommended to be set to true for customers with extremely large topologies, as the node agent can take a much longer time to start up. If you have this set to true and see long node agent start times, it is recommended to set it to false to determine if the issue is in the preloading of the cluster data. Save those changes and synch the config with the nodes, then shut everything down (dmgr NA and servers) and start it up again. If you have WLM trace enabled, you should see this in the trace logs: [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim > loadCustomProperties Entry [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim 1 Loaded custom property IBM_CLUSTER_ENABLE_PRELOAD false/true - the value set in the console [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim 1 Loaded custom property IBM_CLUSTER_CALLBACK_TIMEOUT ##### - whatever value was set in the console [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim > loadCustomProperties Exit If this is seen, then the cell has loaded the custom properties correctly and they should be used at runtime where applicable. The fix for this APAR is currently targeted for inclusion in fixpack 6.0.2.11. Please refer to the recommended updates page for delivery information: http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980 Directions to apply fix: NOTE: Choose the: 1) Release the fix applies to 2) The Editions that apply 3) Delete the Editions & Methods that do not apply and this Note Fix applies to Editions: Release 6.0 __ Application Server (Express or BASE) __ Network Deployment (ND) __ WebSphere Business Integration Server Foundation (WBISF) __ Edge Components __ Developer __ Extended Deployment (XD) Install Fix to: Method: __ Application Server Nodes __ Deployment Manager Nodes __ Both NOTE: The user must: * Have Administrative rights in Windows, or be the Actual Root User in a UNIX environments. * Logged in with the same authority level when unpacking a fix, fix pack or refresh pack. * Be at V6.0.2.2 or newer of the Update Installer. This can be checked by reviewing the level of the Update Installer in file /updateinstaller/version.txt. The Update Installer can be downloaded from the following link: http://www.ibm.com/support/docview.wss?rs=180&uid=swg21205991 For detailed instructions to Extract the Update Installer see the following Technote: http://www-1.ibm.com/support/docview.wss?rs=180&uid=swg21205400 1) Copy PKxxxxx.pak file directly to the maintenance directory 2) Shutdown WebSphere Manually execute setupCmdLine.bat in Windows or . ./setupCmdLine.sh in Unix from the WebSphere instance that maintenance is being applied to. 3) Launch Update Installer 4) Enter the installation location of the WebSphere product you want to update. 5) Select the "Install maintenance package" operation. 6) Enter the file name of the maintenance package to install (PKxxxxx.pak file which was copied in the maintenance directory). 7) Install the maintenance package. 8) Restart WebSphere Directions to remove fix: NOTE: * The user must have Administrative rights in Windows, or be the Actual Root User in a UNIX environments. * FIXES MUST BE REMOVED IN THE ORDER THEY WERE APPLIED * DO NOT REMOVE A FIX UNLESS ALL FIXES APPLIED AFTER IT HAVE FIRST BEEN REMOVED * YOU MAY REAPPLY ANY REMOVED FIX Example: If your system has fix1, fix2, and fix3 applied in that order and fix2 is to be removed, fix3 must be removed first, fix2 removed, and fix3 re-applied. 1) Shutdown WebSphere Manually execute setupCmdLine.bat in Windows or . ./setupCmdLine.sh in Unix from the WebSphere instance that uninstall is being run against. 2) Start Update Installer 3) Enter the installation location of the WebSphere product you want to remove the fix. 4) Select "Uninstall maintenance package" operation. 5) Enter the file name of the maintenance package to uninstall (PKxxxxx.pak). 6) UnInstall maintenance package. 7) Restart WebSphere Directions to re-apply fix: 1) Shutdown WebSphere. 2) Follow the Fix instructions to apply the fix. 3) Restart WebSphere. Additional Information: