PQ85156: IN A MULTI-SERVANT REGION ENVIRONMENT, ISSUING A STOP ON AN APPLICATION MAY NOT STOP THE APPLICATION ON ALL SERVANT REGIONS. | ||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||
![]() APAR status Closed as program error. Error description When issuing a stop on an application in a mult-SR environment, the application in question may not stop on all servant regions if work has not been dispatched on all servant regions. On the servant regions where work has not been dispatched (i.e. a client request, etc...), the stop is not issued, and the application continues to run. This behavior has also been witnessed when a restart is issued on an application. A restart is done when an update is attempted on an application from the admin console. In this case, the application is only restarted (i.e. stopped then started again) on one of the servant regions. A closely related problem which will also be addressed with this APAR, is the incorrect setting of application status in the admin console when WLM shuts down a servant region in a multi-servant environment. When this occurs, the application status is reported erroneously as stopped, even though a servant where the application is running is still available. As a side effect to this problem you might notice multiple instances of the following error in servant regions started after the first servant is up (where wlm_minimumSRCount is set to > 1). The errors appear after initialization in 1 minute intervals and after 15 minutes, in 5 minute intervals. You will continue to see these errors until the servant region starts processing clients requsts, etc. as described earlier. BBOO0010W The function CORBA::throw_sysexcp(const char*,ULong,Co mpletionStatus)+886 raised CORBA system exception CORBA::UNKNOWN. Error code is C9C25790. And more detailed tracing shows: Trace: 2004/06/14 14:39:42.741 01 t=9DFB00 c=UNK key=P8 (13007002) FunctionName: com.ibm.ws390.orb.ClientDelegate SourceId: com.ibm.ws390.orb.ClientDelegate Category: EVENT ExtendedMessage: ClientDelegate.invoke; java.rmi.RemoteException: controller unable to connect to servant; nested exception is: .com.ibm.websphere.management.exception.ConnectorNotAvailableExc eption: Nonexistant stoken/IOR com.ibm.websphere.management.exception.ConnectorNotAvailableExce ption: Nonexistant stoken/IOR and on the controller side: Trace: 2004/06/19 00:25:54.227 01 t=8CE078 c=UNK key=S2 (13007002) FunctionName: com.ibm.ws390.management.connector.corba.CorbaConnectorClient SourceId: com.ibm.ws390.management.connector.corba.CorbaConnectorClient Category: DEBUG ExtendedMessage: throwing exception;com.ibm.websphere.managemen t.exception.ConnectorNotAvailableException: Nonexistant stoken/IOR ADDITIONAL SYMPTOM: TPV may fail with a client side message PMON3009W , indicating a failure to connect. This error occurs when Server has more than one servant region. Also javax.management.MBeanException: Non-existant servant exceptions are thrown on the client side. To see these exceptions run tperfviewer from a DOS window specifying: tperfviewer DEBUG host port protocol where host, port, protocol are optional parms. Example: tperfviewer DEBUG myhost 8879 SOAPLocal fix Set wlm_minimumSRCount to 1 (default). With this setting only one servant will be started when application server is brought up. If workload is more than one servant can handle, WLM will start more servants when necessary. wlm_maximumSRCount variable does not have any effect on this problem and can be set to whatever you decide is right for your environment.Problem summary **************************************************************** * USERS AFFECTED: All users of WebSphere Application Server * * V5.0 for z/OS * **************************************************************** * PROBLEM DESCRIPTION: In a mutli-servant region environment, * * issuing a stop on an application may * * not stop that application on all * * servant regions. * * * * "Admin" component tracing will show: * * * * FunctionName: * * com.ibm.ws390.orb.ClientDelegate * * SourceId: * * com.ibm.ws390.orb.ClientDelegate * * Category: EVENT * * ExtendedMessage: ClientDelegate.invoke; * * java.rmi.RemoteException: controller * * unable to connect to servant; nested * * exception is: com.ibm.websphere.managem * * ent.exception.ConnectorNotAvailableExce * * ption: Nonexistant stoken/IOR * * com.ibm.websphere.management.exception. * * ConnectorNotAvailableException: * * Nonexistant stoken/IOR and on the * * controller side * **************************************************************** * RECOMMENDATION: * **************************************************************** When issuing a stop on an application in a multi-SR environment, the application in question may not stop on all servant regions if work has not been dispatched to all servant regions. On the servant regions where work has not been dispatched (i.e. a client request, etc...), the stop is not issued, and the application continues to run. This behavior has also been witnessed when a restart is issued on an application. A restart is done when an update is attempted on an application from the admin console. In this case, the application is only restarted (i.e. stopped then started again) on one of the servant regions. A closely related problem which will also be addressed with this APAR, is the incorrect setting of application status in the admin console when WLM shuts down a servant region in a multi-servant environment. When this occurs, the application status is reported erroneously as stopped, even though a servant where the application is running is still available. As a side effect to this problem you might notice multiple instances of the "Nonexistant stoken" error in servant regions started after the first servant is up (where wlm_minimumSRCount is set to > 1). The errors appear after initialization in 1 minute intervals and after 15 minutes, in 5 minute intervals. You will continue to see these errors until the servant region starts processing clients requsts, etc. as described earlier.Problem conclusion The internal JMX connector that is used to communicate stop appliciation / start application requests has been changed so that it is no longer reliant on WLM. The following publication was revised as a result of APAR PQ85156: ________________________________________________________________ WebSphere Application Server V5 for z/OS Messages and Codes GA22-7915-01 ________________________________________________________________ NOTE: Periodically, we refresh the documentation on our Web site, so the changes might have been made before you read this text. To access the latest on-line documentation, go to the product library page at: www.ibm.com/software/webservers/appserv/zos_os390/library.html ________________________________________________________________ Chapter 03, pg. 179 (new minor code) C9C25750 Explanation: Function not implemented User Response: IBM Internal only ________________________________________________________________ Chapter 03, pg. 85 (new minor code) Message identifier - 0xC9C25749 Explanation: Function not implemented User Response: IBM Internal only. Users should not use the operation that produced this error. ________________________________________________________________ Chapter 03, pg. 85 (new minor code) Message identifier - 0xC9C25748 Explanation: Controller region has given up waiting for an asynchronous JMX response because a servant region is taking too long to respond. User Response: Users may attempt to resubmit the request or may opt to extend the timeout by adjusting the 'jmx.async.timeout' custom property. If the request continues to produce a failure then the user should assume the request cannot be completed and handle the error appropriately. ________________________________________________________________ Chapter 03, pg. 85 (new minor code) Message identifier - 0xC9C25747 Explanation: Controller region has given up waiting for an asynchronous JMX response because the server is being shutdown. User Response: Users should not attempt to resubmit the request. The server is being stopped, so user code should attempt to gracefully cleanup and exit. ________________________________________________________________ Chapter 03, pg. 85 (new minor code) Message identifier - 0xC9C25746 Explanation: Controller region has given up waiting for an asynchronous JMX response because the thread that was waiting has been interrupted. User Response: Users attempt to resubmit the request. If the request continues to produce a failure then the user should assume the request cannot be completed and handle the error appropriately. ________________________________________________________________ **************************************************************** An article in the WebSphere Application Server InfoCenter will been updated to information for this APAR. To access the latest online documentation, go to: http://publib.boulder.ibm.com/infocenter/wasinfo/index.jsp ---------------------------------------------------------------- The change includes information added to the "Example: Setting JVM Custom Properties" article (xrun_jvm.html), as follows: jmx.asynch.timeout The jmx.async.timeout custom property is associated with the asynchronous JMX connector. By setting this property, you can override the default timeout of 180 seconds (three minutes), which allows you to more closely control the behavior of the asynchronous JMX connector. If the timeout value is too short, JMX operations may fail to complete. If the timeout value is too long, JMX operations that take a long time to complete (for instance, because of a slow servant, lengthy MBean implementation, or server error) may slow down JMX clients. Steps for this task 1. Connect to the administrative console and navigate to the Java Virtual Machine Custom Properties panel: Servers > Application Servers >server1 > Custom Properties 2. If the jmx.asynch.timeout property is not present in the list, create a new property name. 3. Enter the name and value. APAR PQ85156 is associated with SERVICE LEVEL W502016 of WebSphere Application Server V5.0 for z/OS.Temporary fix Comments ž**** PE04/12/01 FIX IN ERROR. SEE APAR PQ97797 FOR DESCRIPTION
APAR is sysrouted FROM one or more of the following: APAR is sysrouted TO one or more of the following: PQ89857 Modules/Macros
Publications Referenced
|
Document Information |
Current web document: swg1PQ85156.html
Product categories: Software > Application Servers >
Distributed Application & Web Servers > WebSphere Application
Server for z/OS
Operating system(s):
Software version: 500
Software edition:
Reference #: PQ85156
IBM Group: Software Group
Modified date: Dec 15, 2004
(C) Copyright IBM Corporation 2000, 2009. All Rights Reserved.