PQ96700: NMSV0015E BY ADMINSERVER DUE TO A JAVA.LANG.NULLPOINTEREXCEPTIONAFTER AN APPSERVER RESTART OR OTHER NAME SPACE UPDATE

APAR status
Closed as program error.

Error description
This is an intermittent problem that only seems to happen after
an AppServer crash or some other event causes a change in the
status of the name space (status of servers).
.
The problem causes the AdminServer to stop and eventually is
resolved by stopping and restarting the AdminServer several
times.
.
The following messages are logged in the tracefile (the system
out file of the AdminServer):
10/20/04 10:10:56:347 CDT  2ce9e130 EJSAdminRepos W WWLM0028:
Method
getServerGroupFromRepository encountered an unexpected
exception.
java.lang.NullPointerException
 at com.ibm.ws.wlm.admin.config.EJSAdminRepositoryServer.
getServerGroupFromRepository(EJSAdminRepositoryServer.java:111)
 at com.ibm.ws.wlm.server.config.WLMTemplateImpl.pull
WLMTemplateImpl.java:199)
 at com.ibm.ws.wlm.server.config._WLMTemplateImpl_Tie._invoke
(_WLMTemplateImpl_Tie.java:95)
 at com.ibm.CORBA.iiop.ExtendedServerDelegate.dispatch
ExtendedServerDelegate.java:532)
 at com.ibm.CORBA.iiop.ORB.process(ORB.java:2450)
 at com.ibm.CORBA.iiop.OrbWorker.run(OrbWorker.java:186)
 at
m.ibm.ejs.oa.pool.ThreadPool$PooledWorker.run(ThreadPool.java:10
4)   at
com.ibm.ws.util.CachedThread.run(ThreadPool.java:144)
.
This eventually leads to the NMSV0015E to be logged.
.
This problem is caused by the WAS Workload Management component
routing a request to the Naming Service based on a previously
created routing table.  That is,
this problem depends on the ports being used by WAS, in
particular the bootstrap port, being statically defined.  This
is what allows a request to reach the name service prior to it
being fully initialized.  It can only happen on a restart.
Local fix Problem summary
****************************************************************
* USERS AFFECTED: Users of WebSphere Application Server        *
*                 V4.0.6 and V4.0.7                            *
****************************************************************
* PROBLEM DESCRIPTION: In a clustered adminServer              *
*                      environment, if one adminServer is      *
*                      down abnormally and is started again,   *
*                      there is a small window that WorkLoad   *
*                      Management (WLM) could route Naming     *
*                      operation requests in before the        *
*                      Naming service completes its            *
*                      initialization. That will cause a       *
*                      NamingContext object not being created  *
*                      and initialized properly. This would    *
*                      further cause all subsequent calls on   *
*                      that NamingContext to fail, including   *
*                      the internal calls to initialize the    *
*                      Naming beans fail with a                *
*                      NullPointerException. As a result,      *
*                      the adminServer won't restart.          *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
In a clustered adminServer environment, if one adminServer is
down abnormally and is started again, there is a small window
that WLM could route Naming operation requests in before the
Naming service completes its initialization. That will cause a
NamingContext object not being created and initialized
properly due to the fact that the NamingContext
creation/initialization is done before Naming beans are
initialized. This would further cause all subsequent calls on
that NamingContext to fail, including the internal calls to
initialize the Naming beans fail with a NullPointerException.
As a result, the adminServer won't restart.
Problem conclusion
To fix this problem, add a check to detect a premature
NamingContext constructor call so that the coming naming
operation request on the NamingConext will trigger a
org.omg.CORBA.COMM_FAILURE exception being thrown if the Naming
service initialization is in progress. Once WLM catches the
COMM_FAILURE exception, it will failover the request to another
adminServer.
Temporary fix Comments
APAR information
APAR number PQ96700
Reported component name WEBSPHERE AE AI
Reported component ID 5630A2200
Reported release 400
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Submitted date 2004-11-03
Closed date 2004-11-10
Last modified date 2004-11-10

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros
NAMING          

Fix information

Applicable component levels
R400 PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > General
Operating system(s):
Software version: 400
Software edition:
Reference #: PQ96700
IBM Group: Software Group
Modified date: Nov 10, 2004