|
Problem(Abstract) |
J2CA0045E and J2CA0020E errors can be caused by many
problems. They are showing a time-out condition where a resource or a
managed connection is not available to fulfill a connection request.
In this technote we will use connection information in WebSphere® trace
files to troubleshoot J2CA0045E and J2CA0020E or connection wait time-out
problems. |
|
|
|
Cause |
In a trace file with
WAS.j2c=all=enabled:RRA=all=enabled:WAS.database=all=enabled, you will get
dumps of the connection pooling that look like the following:
[7/20/05 23:40:01:450 EDT] 6a8068fa d UOW=
source=com.ibm.ejs.j2c.poolmanager.PoolManager org=IBM prod=WebSphere
component=Application Server reserve(), Pool contents ==> PoolManager
name:jdbc/xxxxxx
PoolManager object:1424371533
Total number of connections: 45 (max/min 45/20)
Pool Bucket configuration
Shared Buckets: 200
UnShared Buckets: 45
The waiter count is 38
The mcWrappers in waiter queue []
Shared Connection information
Unshare Connection information
Figure 1
|
|
|
Resolving the
problem |
1) See if you have Waiters:
Waiters are connection requests waiting on a managed connection. If a
connection request is made and a managed connection is not allocated for
that request , the request will timeout and throw an error combination
like below in your SystemOut.log and or your trace file:
J2CA0045E: Connection not available while invoking method queueRequest for
resource jdbc/xxxxxx.
J2CA0020E: The Connection Pool Manager could not allocate a Managed
Connection: com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException:
Connection not available, Timed out waiting for 10988
One way to use the connection pool section in the trace file to
troubleshoot these kinds of errors is to enter the trace file and search
for the word "waiter". As you move down through the file, you will
expect to see the waiter count to increase and decrease if in fact you do
have waiters. If you do not have any waiters, the waiter count entry will
not appear in the trace for a particular connection object listing and you
can be assured that your connection pooling is sized properly and you do
not have contention for resources. If you do see waiters, you have
connection requests waiting on managed connections.
2) Discover if the waiters you have is on one or more PoolManager
Object:
Another important point to make at this point is that there can be more
than one PoolManager Object. There is one pool manager object per data
source that is configured for the application server and the number
assigned to that PoolManager object will remain the same for the duration
of the application server run from startup to stop of the same.
Once you discover you have waiters, you must find out if they are for one
or more PoolManager Objects. You can determine how many pool manager
objects you have by searching on the keyword PoolManager object then note
the numbers that follow. In the case of figure 1, the number is:
1424371533.
Once you have determined how many PoolManager Objects you have, you must
determine if the waiters are for one Object or many. You do this by going
through each of the PoolManager Objects to see if in fact that Object has
the waiters. You can usually tell after going through a few of them or try
the first one in the file and the last one in the file.
3) Assess the damage:
Look in the area called Shared Connection information that is below all
the PoolManger Object information. Pull that information out into a
separate file and then sort it by thread ID or manually move the same
thread ID lines together. You can see problems as such in some cases:
A) J2CA0086W errors can be found to look as follows in the Shared
Connection information section:
com.ibm.ws.LocalTransaction.LocalTranCoordImpl@4423ef2d MCWrapper
id 3705a8c8 Managed connection
com.ibm.ws.rsadapter.spi.WSRdbManagedConnectionImpl@2612ef7a
State:STATE_TRAN_WRAPPER_INUSE Thread Id: 2a1368fa Thread
Name: Servlet.Engine.Transports : 39 Handle count Start time inuse
Wed Jul 20 23:24:32 EDT 2005 Time inuse 929 (seconds)
com.ibm.ws.LocalTransaction.LocalTranCoordImpl@4423ef2d MCWrapper
id 67cc68ce Managed connection
com.ibm.ws.rsadapter.spi.WSRdbManagedConnectionImpl@6f4128ce
State:STATE_TRAN_WRAPPER_INUSE Thread Id: 2a1368fa Thread
Name: Servlet.Engine.Transports : 39 Handle count Start time inuse
Wed Jul 20 23:24:32 EDT 2005 Time inuse 929 (seconds)
What do we see here? We see that the transaction type is local for both
entries. We see the thread ID is indicating the same thread. We also see
that the managed connection is a different connection. Keep in mind that
we are looking in the shared connection area. So, what we have here is two
different connections that are held by the same thread and the same local
transaction. The result of this is that these two separate connections
will be held until the local transaction is complete. If our max
connections for the data source is 10, we have just decreased it to 8 for
the duration of the local transaction above.
B) Nested Local transactions:
In the information below that is taken from the Shared Connection
Information section as well, you may see the following:
com.ibm.ws.LocalTransaction.LocalTranCoordImpl@1a3fef31 MCWrapper
id 1fea8bc Managed connection
com.ibm.ws.rsadapter.spi.WSRdbManagedConnectionImpl@195ee8e4
State:STATE_TRAN_WRAPPER_INUSE Thread Id: 2d9228fa Thread
Name: Servlet.Engine.Transports : 40 Handle count Start time inuse Wed Jul
20 23:26:30 EDT 2005 Time inuse 811 (seconds)
com.ibm.ws.LocalTransaction.LocalTranCoordImpl@5f08288c MCWrapper
id 21d428c1 Managed connection
com.ibm.ws.rsadapter.spi.WSRdbManagedConnectionImpl@3a40e8d2
State:STATE_TRAN_WRAPPER_INUSE Thread Id: 2d9228fa Thread
Name: Servlet.Engine.Transports : 40 Handle count Start time inuse Wed Jul
20 23:26:28 EDT 2005 Time inuse 812 (seconds)
What do we see here? We see the same thread ID with two shared
connections that are of different connection IDs and two separate
transaction IDs. In this case, these are nested local transactions in the
same thread.
Because shareable connections are used, even if the app called close() on
the first connection it will remain allocated, and when the thread enters
a nested LTC and gets a connection, it cannot re-use the first connection
because of the differing LTC contexts. Each connection is released as the
associated LTC comes to an end.
These kind of situations are cause when you pass control of processing in
the thread from something like a servlet to another servlet or EJB™. Could
be several things. Again, we have taken up two connections out of our
pool.
Note! If the customer’s code must have this type of nested
architecture, they should use unshared connections to fix the problem.
Yes, they will see more connections but, the connections will release as
soon as the work with them is complete.
Nested Global and Local Transactions:
com.ibm.ws.Transaction.JTA.TransactionImpl@7ebdafd2#tid=510632
MCWrapper id 421fa8c1 Managed connection
com.ibm.ws.rsadapter.spi.WSRdbManagedConnectionImpl@1a5d68c8
State:STATE_TRAN_WRAPPER_INUSE Thread Id: 32c4e8fa Thread
Name: Servlet.Engine.Transports : 381 Handle count Start time inuse Wed
Jul 20 23:36:09 EDT 2005 Time inuse 231 (seconds)
com.ibm.ws.LocalTransaction.LocalTranCoordImpl@69132f2d MCWrapper
id 16df28df Managed connection
com.ibm.ws.rsadapter.spi.WSRdbManagedConnectionImpl@d5fe8e4
State:STATE_TRAN_WRAPPER_INUSE Thread Id: 32c4e8fa Thread
Name: Servlet.Engine.Transports : 381 Handle count Start time inuse Wed
Jul 20 23:36:05 EDT 2005 Time inuse 235 (seconds)
What do we see here? We have a local transaction that spawned a global
transaction. The same thread number is identified for both. We can tell
which one did the spawning by the Start time inuse entry in the entries
above. They both have different managed connection numbers so that means
is that each of them has a shared connection but not the same one. There
is two more connections that are basically dependant on each other out of
our max number of connections.
Long Running Transactions:
com.ibm.ws.LocalTransaction.LocalTranCoordImpl@31f36fda MCWrapper id
1e64a8af Managed connection
com.ibm.ws.rsadapter.spi.WSRdbManagedConnectionImpl@3d5328a8
State:STATE_TRAN_WRAPPER_INUSE Thread Id: 7b76e8fa Thread Name:
Servlet.Engine.Transports : 35 Handle count Start time inuse Wed Jul 20
23:23:17 EDT 2005 Time inuse 1003 (seconds)
What do we see here? We see this transaction in the Shared Connections
section is showing us that it has been in use for 1003 seconds. That is a
bit much. If you see entries such as these and the customer’s transaction
time out is much less than this, you can expect that there is something
holding up this transaction like the database or JDBC™. Also, note the
Handle count is null so it is waiting on something to complete.
Note: Transaction timeout does not kill a transaction when it
times out. The transaction timeout will not be enforced until the
transaction comes back from the database or what ever it is working into.
Therefore, if the transaction time-out is reached, all that means is that
when the transaction does come back, it will be rolled back or handled in
a non-positive way depending on how the customer has his transactions set
up in the deployment descriptor but, the default is to roll the
transaction back. Further, if a database, JMS or something else does not
perform the tasks requested of it in a transaction before the end of the
transaction timeout setting, the transaction timeout. WebSphere will
rollback the transaction. If the transaction does not come back or is not
released by the database, JMS or whatever, it will sit there and hold the
resources.
Follow on:
If you look in the Shared Connection information or the Unshared
Connection information and you see an entry no shared connections
or no unshared connections respectively, this does not mean that
there are no connections left. This means there are none in use.
Both shared and unshared connections count toward the maximum number of
connections set in your data source.
If you see the free connection count, you have to remember the minimum
connections. If connections time-out, they will go away down to the
minimum connections number.
Other possible causes of the ConnectionWaitTimeout:
The web container thread pool settings.
1. Setting the isGrowable flag on in the WebContainer thread pool can
cause this. If you have too many consumers for the size of the connection
pool.
2. Similarly, a web container thread pool whose maximum is too great for
the connection pool, then it can cause this too. |
|
|
|