PQ58462: DATASHARING=0 CAUSE ACTIVATE HANG IN SYSPLEX ENV | |||||||||||||||||||||||||||||||||||||||||
![]() |
|||||||||||||||||||||||||||||||||||||||||
![]() APAR status Closed as program error. Error description When attempting to do an activate of a new conversation in a multi-system environment, a hang condition was encountered. It turned out that the customer had turned off DATASHARING in their environment file and WebSphere did not know about the servers on the other system when trying to locate an object in one of those servers. A hang condition is an inappropriate response A bit of output that might help identify the issue is: 08 17:59:28.340 01 SYSTEM=ROSE SERVER=SYSMGT01 JobName=PBOSMS ASID=0X0093 PID=0X04010078 TID=0X28CEF000 0X000018 c=12.1 ./bbocsess.cpp+4342 ... BBOU0639E Function connect() failed wit RV=-1, RC=1116, RSN=766303FF, EDC8116I Address not available. hostname/ip: 163.231.162.25 port: . note the blank port value.Local fix Set DATASHARING=1 to resolve this issue.Problem summary **************************************************************** * USERS AFFECTED: All users of WebSphere Application Server * * V4.0.1 for z/OS and OS/390. * **************************************************************** * PROBLEM DESCRIPTION: Server region hangs if outbound locate * * fails in the Control Region * **************************************************************** * RECOMMENDATION: * **************************************************************** An outbound locate request from a Server region did not receive a response. The comm_outbound_locate routine (bboocomm.cpp) is waiting the Server region thread for a response. It had built and queued an ACRW into the Control region for ACRW_TYPE_OUTBOUND_LOCATE_REQUEST. In the control region routine comm_outbound_locate module (bboocomm.cpp) sent the locate request to the local daemon. The daemon could not find the target Server, it responded with a partially built IOR to cause a failure on the next locate forward request (only done for WebSphere internal objects). The returned IOR had a port of 0 in all the protocol tags. Method comm_inbound_response handles the locate response returned from the daemon. When it attempted a locate on the new IOR it received from the daemon it encountered the comm error: "Function connect() failed with RV=-1, RC=1116, RSN=766303FF." Method comm_inbound_response does not have logic to account for this failure. It assumes that an asynchronous response will arrive for the call made to comm_outbound_locate. Thus, when it returns to the WebSphere execution thread (bbooboat.cpp), there is no indication that this locate request is done. The execution thread assumes that another locate request was issued and that the server region thread must continue to wait for the outcome. There is no outstanding locate request, therefore the server region thread is hung.Problem conclusion Code has been modified in routine comm_inbound_response module bboocomm.cpp to propagate an indication of a comm_outbound_locate failure back to the execution thread (bbooboat.cpp). The execution thread, ACR_ExecutionThread::RemoveAndProcessWork, will cause the post of the server region thread in this event. APAR PQ58462 is associated with SERVICE LEVEL W401033 of WebSphere Application Server V4.0.1 for z/OS and OS/390.Temporary fix Comments
APAR is sysrouted FROM one or more of the following: APAR is sysrouted TO one or more of the following: UQ64351 Modules/Macros
|
Document Information |
Product categories: Software > Application Servers >
Distributed Application & Web Servers > WebSphere Application
Server for z/OS
Operating system(s):
Software version: 401
Software edition:
Reference #: PQ58462
IBM Group: Software Group
Modified date: Apr 2, 2002
(C) Copyright IBM Corporation 2000, 2006. All Rights Reserved.