PQ85156: IN A MULTI-SERVANT REGION ENVIRONMENT, ISSUING A STOP ON AN APPLICATION MAY NOT STOP THE APPLICATION ON ALL SERVANT REGIONS.

 A fix is available

Obtain the fix for this APAR



APAR status
Closed as program error.

Error description
When issuing a stop on an application in a mult-SR environment,
the application in question may not stop on all servant
regions if work has not been dispatched on all servant regions.
On the servant regions where work has not been dispatched (i.e.
a client request, etc...), the stop is not issued, and the
application continues to run.  This behavior has also been
witnessed when a restart is issued on an application.  A
restart is done when an update is attempted on an application
from the admin console.  In this case, the application is only
restarted (i.e. stopped then started again) on one of the
servant regions.

A closely related problem which will also be addressed with
this APAR, is the incorrect setting of application status in
the admin console when WLM shuts down a servant region in a
multi-servant environment.  When this occurs, the application
status is reported erroneously as stopped, even though a
servant where the application is running is still available.

As a side effect to this problem you might notice multiple
instances of the following error in servant regions started
after the first servant is up (where wlm_minimumSRCount is set
to > 1).  The errors appear after initialization in 1 minute
intervals and after 15 minutes, in 5 minute intervals.  You will
continue to see these errors until the servant region starts
processing clients requsts, etc. as described earlier.

BBOO0010W The function CORBA::throw_sysexcp(const char*,ULong,Co
mpletionStatus)+886 raised CORBA system exception
CORBA::UNKNOWN.  Error code is C9C25790.

And more detailed tracing shows:

Trace: 2004/06/14 14:39:42.741 01 t=9DFB00 c=UNK key=P8
                                                      (13007002)
  FunctionName: com.ibm.ws390.orb.ClientDelegate
  SourceId: com.ibm.ws390.orb.ClientDelegate
  Category: EVENT
  ExtendedMessage: ClientDelegate.invoke;
java.rmi.RemoteException: controller unable to connect to
servant; nested exception is:
.com.ibm.websphere.management.exception.ConnectorNotAvailableExc
eption: Nonexistant stoken/IOR
com.ibm.websphere.management.exception.ConnectorNotAvailableExce
ption: Nonexistant stoken/IOR
and on the controller side:

Trace: 2004/06/19 00:25:54.227 01 t=8CE078 c=UNK key=S2
                                                      (13007002)
 FunctionName:
com.ibm.ws390.management.connector.corba.CorbaConnectorClient
 SourceId:
com.ibm.ws390.management.connector.corba.CorbaConnectorClient
 Category: DEBUG
 ExtendedMessage: throwing exception;com.ibm.websphere.managemen
t.exception.ConnectorNotAvailableException: Nonexistant
stoken/IOR
ADDITIONAL SYMPTOM:
TPV may fail with a client side message PMON3009W , indicating
a failure to connect. This error occurs when Server has more
than one servant region. Also
javax.management.MBeanException: Non-existant servant
exceptions are thrown on the client side. To see these
exceptions run tperfviewer from a DOS window specifying:
tperfviewer DEBUG host port protocol
where host, port, protocol are optional parms. Example:
tperfviewer DEBUG myhost 8879 SOAP
Local fix
Set wlm_minimumSRCount to 1 (default). With this setting
only one servant will be started when application server is
brought up. If workload is more than one servant can handle, WLM
will start more servants when necessary.

wlm_maximumSRCount variable does not have any effect on this
problem and can be set to whatever you decide is right for your
environment.
Problem summary
****************************************************************
* USERS AFFECTED: All users of WebSphere Application Server    *
*                 V5.0 for z/OS                                *
****************************************************************
* PROBLEM DESCRIPTION: In a mutli-servant region environment,  *
*                      issuing a stop on an application may    *
*                      not stop that application on all        *
*                      servant regions.                        *
*                                                              *
*                      "Admin" component tracing will show:    *
*                                                              *
*                      FunctionName:                           *
*                      com.ibm.ws390.orb.ClientDelegate        *
*                      SourceId:                               *
*                      com.ibm.ws390.orb.ClientDelegate        *
*                      Category: EVENT                         *
*                      ExtendedMessage: ClientDelegate.invoke; *
*                      java.rmi.RemoteException: controller    *
*                      unable to connect to servant; nested    *
*                      exception is: com.ibm.websphere.managem *
*                      ent.exception.ConnectorNotAvailableExce *
*                      ption: Nonexistant stoken/IOR           *
*                      com.ibm.websphere.management.exception. *
*                      ConnectorNotAvailableException:         *
*                      Nonexistant stoken/IOR and on the       *
*                      controller side                         *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
When issuing a stop on an application in a multi-SR environment,
the application in question may not stop on all servant
regions if work has not been dispatched to all servant regions.
On the servant regions where work has not been dispatched (i.e.
a client request, etc...), the stop is not issued, and the
application continues to run.  This behavior has also been
witnessed when a restart is issued on an application.  A
restart is done when an update is attempted on an application
from the admin console.  In this case, the application is only
restarted (i.e. stopped then started again) on one of the
servant regions.

A closely related problem which will also be addressed with
this APAR, is the incorrect setting of application status in
the admin console when WLM shuts down a servant region in a
multi-servant environment.  When this occurs, the application
status is reported erroneously as stopped, even though a
servant where the application is running is still available.

As a side effect to this problem you might notice multiple
instances of the "Nonexistant stoken" error in servant regions
started after the first servant is up (where wlm_minimumSRCount
is set to > 1).  The errors appear after initialization in 1
minute intervals and after 15 minutes, in 5 minute intervals.
You will continue to see these errors until the servant region
starts processing clients requsts, etc. as described earlier.
Problem conclusion
The internal JMX connector that is used to communicate
stop appliciation / start application requests has been changed
so that it is no longer reliant on WLM.

The following publication was revised as a result
of APAR PQ85156:
________________________________________________________________
WebSphere Application Server V5 for z/OS
Messages and Codes
GA22-7915-01
________________________________________________________________

NOTE: Periodically, we refresh the documentation on our
Web site, so the changes might have been made before you
read this text. To access the latest on-line
documentation, go to the product library page at:

www.ibm.com/software/webservers/appserv/zos_os390/library.html

________________________________________________________________
Chapter 03, pg. 179 (new minor code)

C9C25750

Explanation: Function not implemented

User Response: IBM Internal only
________________________________________________________________
Chapter 03, pg. 85 (new minor code)
Message identifier - 0xC9C25749
Explanation: Function not implemented
User Response: IBM Internal only.
Users should not use the operation that produced this error.
________________________________________________________________
Chapter 03, pg. 85 (new minor code)
Message identifier - 0xC9C25748
Explanation: Controller region has given up waiting for an
asynchronous JMX response because a servant region is taking too
long to respond.
User Response: Users may attempt to resubmit the request or
may opt to extend the timeout by adjusting the
'jmx.async.timeout' custom property.  If the request continues
to produce a failure then the user should assume the request
cannot be completed and handle the error appropriately.
________________________________________________________________
Chapter 03, pg. 85 (new minor code)
Message identifier - 0xC9C25747
Explanation: Controller region has given up waiting for an
asynchronous JMX response because the server is being shutdown.
User Response: Users should not attempt to resubmit the request.
The server is being stopped, so user code should attempt to
gracefully cleanup and exit.
________________________________________________________________
Chapter 03, pg. 85 (new minor code)
Message identifier - 0xC9C25746
Explanation: Controller region has given up waiting for an
asynchronous JMX response because the thread that was waiting
has been interrupted.
User Response: Users attempt to resubmit the request.
If the request continues to produce a failure then the user
should assume the request cannot be completed and handle the
error appropriately.
________________________________________________________________

****************************************************************

An article in the WebSphere Application Server InfoCenter will
been updated to information for this APAR. To access the latest
online documentation, go to:


http://publib.boulder.ibm.com/infocenter/wasinfo/index.jsp

----------------------------------------------------------------

The change includes information added to the "Example: Setting
JVM Custom Properties" article (xrun_jvm.html), as follows:

jmx.asynch.timeout

The jmx.async.timeout custom property is associated with the
asynchronous JMX connector. By setting this property, you can
override the default timeout of 180 seconds (three minutes),
which allows you to more closely control the behavior of the
asynchronous JMX connector. If the timeout value is too short,
JMX operations may fail to complete. If the timeout value is
too long, JMX operations that take a long time to complete
(for instance, because of a slow servant, lengthy MBean
implementation, or server error) may slow down JMX clients.


Steps for this task

1. Connect to the administrative console and navigate to the
   Java Virtual Machine Custom Properties panel:

   Servers > Application Servers >server1 > Custom Properties

2. If the jmx.asynch.timeout property is not present in the
   list, create a new property name.

3. Enter the name and value.

APAR PQ85156 is associated with SERVICE LEVEL W502016 of
WebSphere Application Server V5.0 for z/OS.
Temporary fix Comments
ž**** PE04/12/01 FIX IN ERROR. SEE APAR 
PQ97797  FOR DESCRIPTION
APAR information
APAR number PQ85156
Reported component name WEBSPHERE FOR Z
Reported component ID 5655I3500
Reported release 500
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Special Attention NoSpecatt
Submitted date 2004-02-25
Closed date 2004-10-07
Last modified date 2004-12-15

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:
PQ89857

Modules/Macros
BBOUBINF          

Publications Referenced
GA22791501        

Fix information
Fixed component name WEBSPHERE FOR Z
Fixed component ID 5655I3500

Applicable component levels
R500 PSY UQ93769    UP04/10/15 P F410

  Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.


Document Information


Current web document: swg1PQ85156.html
Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server for z/OS
Operating system(s):
Software version: 500
Software edition:
Reference #: PQ85156
IBM Group: Software Group
Modified date: Dec 15, 2004