PK20881: APPLICATION SERVER HANGS DUE TO DEADLOCK WHEN MULTIPLE APPLICATION SERVERS ARE INVOLVED IN A LONG-RUNNING TRANSACTION

 Fixes are available

5.1.1.17: WebSphere Application Server V5.1.1 Cumulative Fix 17 for AIX
5.1.1.17: WebSphere Application Server V5.1.1 Cumulative Fix 17 for HP-UX
5.1.1.19: WebSphere Application Server V5.1.1 Cumulative Fix 19 for Linux
5.1.1.16: WebSphere Application Server V5.1.1 Cumulative Fix 16 for AIX
5.1.1.18: WebSphere Application Server V5.1.1 Cumulative Fix 18 for AIX
5.1.1.18: WebSphere Application Server V5.1.1 Cumulative Fix 18 for HP-UX
5.1.1.18: WebSphere Application Server V5.1.1 Cumulative Fix 18 for Solaris
5.1.1.18: WebSphere Application Server V5.1.1 Cumulative Fix 18 for Windows
5.1.1.18: WebSphere Application Server V5.1.1 Cumulative Fix 18 for Linux
5.1.1.17: WebSphere Application Server V5.1.1 Cumulative Fix 17 for Linux
5.1.1.17: WebSphere Application Server V5.1.1 Cumulative Fix 17 for Solaris
5.1.1.17: WebSphere Application Server V5.1.1 Cumulative Fix 17 for Windows
5.1.1.19: WebSphere Application Server V5.1.1 Cumulative Fix 19 for AIX
5.1.1.19: WebSphere Application Server V5.1.1 Cumulative Fix 19 for Windows
5.1.1.11: WebSphere Application Server V5.1.1 Cumulative Fix 11 for AIX
5.1.1.12: WebSphere Application Server V5.1.1 Cumulative Fix 12 for Windows
5.1.1.11: WebSphere Application Server V5.1.1 Cumulative Fix 11 for Windows
5.1.1.16: WebSphere Application Server V5.1.1 Cumulative Fix 16 for Solaris
5.1.1.11: WebSphere Application Server V5.1.1 Cumulative Fix 11 for Linux
5.1.1.16: WebSphere Application Server V5.1.1 Cumulative Fix 16 for Windows
5.1.1.14: WebSphere Application Server V5.1.1 Cumulative Fix 14 for Solaris
5.1.1.12: WebSphere Application Server V5.1.1 Cumulative Fix 12 for AIX
5.1.1.12: WebSphere Application Server V5.1.1 Cumulative Fix 12 for Linux
5.1.1.12: WebSphere Application Server V5.1.1 Cumulative Fix 12 for HP-UX
5.1.1.12: WebSphere Application Server V5.1.1 Cumulative Fix 12 for Solaris
5.1.1.11: WebSphere Application Server V5.1.1 Cumulative Fix 11 for Solaris
5.1.1.13: WebSphere Application Server V5.1.1 Cumulative Fix 13 for AIX
5.1.1.13: WebSphere Application Server V5.1.1 Cumulative Fix 13 for Windows
5.1.1.13: WebSphere Application Server V5.1.1 Cumulative Fix 13 for HP-UX
5.1.1.15: WebSphere Application Server V5.1.1 Cumulative Fix 15 for Solaris
5.1.1.13: WebSphere Application Server V5.1.1 Cumulative Fix 13 for Solaris
5.1.1.13: WebSphere Application Server V5.1.1 Cumulative Fix 13 for Linux
5.1.1.14: WebSphere Application Server V5.1.1 Cumulative Fix 14 for AIX
5.1.1.14: WebSphere Application Server V5.1.1 Cumulative Fix 14 for Linux
5.1.1.14: WebSphere Application Server V5.1.1 Cumulative Fix 14 for Windows
5.1.1.15: WebSphere Application Server V5.1.1 Cumulative Fix 15 for Windows
5.1.1.11: WebSphere Application Server V5.1.1 Cumulative Fix 11 for HP-UX
5.1.1.14: WebSphere Application Server V5.1.1 Cumulative Fix 14 for HP-UX
5.1.1.15: WebSphere Application Server V5.1.1 Cumulative Fix 15 for AIX
5.1.1.15: WebSphere Application Server V5.1.1 Cumulative Fix 15 for HP-UX
5.1.1.16: WebSphere Application Server V5.1.1 Cumulative Fix 16 for HP-UX
5.1.1.16: WebSphere Application Server V5.1.1 Cumulative Fix 16 for Linux
5.1.1.15: WebSphere Application Server V5.1.1 Cumulative Fix 15 for Linux
5.1.1.19: WebSphere Application Server V5.1.1 Cumulative Fix 19 for HP-UX



APAR status
Closed as program error.

Error description
In WebSphere Application Server V5.1.x, an application server
may hang due to a Java deadlock.  The threads involved in the
deadlock can be seen in a thread dump:
.
"ORB.thread.pool : 3":
  waiting to lock monitor 0x000f1720 (object 0xe7b7cda0, a
com.ibm.ws.Transaction.JTA.TransactionImpl),
  which is held by "Alarm : 2"
"Alarm : 2":
  waiting to lock monitor 0x000f1838 (object 0xe7b7d168, a
com.ibm.ws.Transaction.JTS.TransactionWrapper),
  which is held by "ORB.thread.pool : 3"
.
"ORB.thread.pool : 3":
 at
com.ibm.ws.Transaction.JTA.TransactionImpl.addAssociation(Transa
ctionImpl.java:2673)
 - waiting to lock <0xe7b7cda0> (a
com.ibm.ws.Transaction.JTA.TransactionImpl)
 at
com.ibm.ws.Transaction.JTS.TransactionWrapper.rollback(Transacti
onWrapper.java:548)
 - locked <0xe7b7d168> (a
com.ibm.ws.Transaction.JTS.TransactionWrapper)
 at
com.ibm.ws.Transaction.JTS.WSCoordinatorImpl.rollback(WSCoordina
torImpl.java:163)
...
.
"Alarm : 2":
 at
com.ibm.ws.Transaction.JTS.TransactionWrapper.destroy(Transactio
nWrapper.java:841)
 - waiting to lock <0xe7b7d168> (a
com.ibm.ws.Transaction.JTS.TransactionWrapper)
 at
com.ibm.ws.Transaction.JTA.TransactionImpl.forgetTransaction(Tra
nsactionImpl.java:2528)
 at
com.ibm.ws.Transaction.JTA.TransactionImpl.notifyCompletion(Tran
sactionImpl.java:2507)
 - locked <0xe7b7cda0> (a
com.ibm.ws.Transaction.JTA.TransactionImpl)
 at
com.ibm.ws.Transaction.JTA.TransactionImpl.rollback(TransactionI
mpl.java:1176)
...
.
This may occur when one application server (referred to as the
superior server) starts a transaction and then asks another
application server (referred to as the subordinate server) to
perform some work on the transaction.  When the subordinate
server finishes its work, it will inform the superior server and
wait for a response.  It will wait for the "Client inactivity
timeout" number of seconds (which is 60 seconds by default) for
a response.  When the Client inactivity timeout is reached
without a response from the superior server, the subordinate
server will attempt to timeout and rollback the transaction.
The problem occurs when shortly after this, the superior server
sends a request to the subordinate server to rollback the
transaction.  This results in two threads on the subordinate
server trying to initiate a rollback of the same transaction at
the same time, resulting in the deadlock.
Local fix
The problem can be avoided if the "Client inactivity timeout" on
the subordinate server is set to a value higher than the "Total
transaction lifetime timeout" on the superior server.  As a
result of this, the superior server will timeout transactions
before the subordinate server, so the deadlock condition cannot
occur on the subordinate server.
Problem summary
****************************************************************
* USERS AFFECTED: This problem affects user of the Java        *
*                 Transaction Sevice provided with             *
*                 WebSphere Application Server Version         *
*                 5.1.1.x.                                     *
****************************************************************
* PROBLEM DESCRIPTION: Transactions can span multiple          *
*                      application servers. The application    *
*                      server which initiates the transaction  *
*                      is known as the Superior server,        *
*                      while any application servers that are  *
*                      asked to participate in the transaction *
*                      are known as Subordinate servers.       *
*                                                              *
*                      After a Subordinate has completed it's  *
*                      work in a transaction, it  waits for a  *
*                      response from the Superior. The amount  *
*                      of time the Subordiante server waits    *
*                      for this response is specified by the   *
*                      "Client inactivity timeout" property.   *
*                      If this timeout occurs, the             *
*                      Subordinate server will notify the      *
*                      Superior server, and roll back the      *
*                      work it has performed.                  *
*                                                              *
*                      If the "Client inactivity timeout"      *
*                      occurs at the same time as the          *
*                      Superior server issues a rollback       *
*                      request, a deadlock occurs on the       *
*                      Subordinate server, resulting in        *
*                      hung ORB threads.                       *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
When the problem occurred, there were two threads trying to
roll back the same transaction on the Subordinate server. The
first thread was using a TransactionImpl object, but was
blocked waiting for a TransactionWrapper object to become
available so that it could call TransactionWrapper.destroy().
However, the second thread was using this TransactionWrapper,
and was waiting for the TransactionImpl object used by the
first thread to become free.
Problem conclusion
The TransactionWrapper.destroy() method has been changed so
that it is no longer synchronized. This allows the first
thread to continue it's work - when it has finished
processing the transaction rollback, the second thread is
free to continue.

The fix for this APAR is currently targeted for inclusion in
Cumulative Fix 11 for WebSphere Application Server Version
5.1.1. Please refer to the Recommended Updates page
for delivery dates:

   
http://www-1.ibm.com/support/
     docview.wss?rs=180&context=SSEQTP&uid=swg27004980
Temporary fix Comments
APAR information
APAR number PK20881
Reported component name WAS NETWRK DEPL
Reported component ID 5630A3601
Reported release 10A
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Special Attention NoSpecatt
Submitted date 2006-03-06
Closed date 2006-03-30
Last modified date 2006-04-03

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:
PK22540

Modules/Macros
TRANWRAP          

Publications Referenced

Fix information
Fixed component name WAS NETWRK DEPL
Fixed component ID 5630A3601

Applicable component levels
R003 PSN    UP
R00A PSN    UP
R00H PSN    UP
R00I PSN    UP
R00P PSN    UP
R00S PSN    UP
R00W PSN    UP
R103 PSY    UP
R10A PSY    UP
R10H PSY    UP
R10I PSY    UP
R10P PSY    UP
R10S PSY    UP
R10W PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > General
Operating system(s):
Software version: 10A
Software edition:
Reference #: PK20881
IBM Group: Software Group
Modified date: Apr 3, 2006