|
Problem(Abstract) |
How do WLM user.interval and CORBA request.timeout
settings work together and can WLM user.interval effectively be
Disabled? |
|
|
|
Cause |
Corba Load Balancing outside default spec |
|
|
Resolving the
problem |
WebSphere® EJB communications is managed transparently
through ORB and WLM components. Related parameters can be tuned in
accordance to environment and tested. It is also suggested and recommended
to download and read "IBM WebSphere V5.1 Performance, Scalability, and
High Availability WebSphere Handbook Series"
http://publib-b.boulder.ibm.com/abstracts/sg246198.html?Open
for further consideration and detail.
The CORBA.requestTimeout can be adjusted for performance gain in heavy
load scenarios for resource servers to provide enough time for the request
to complete and CORBA.requestRetryCount can be coupled for completion of
this task. The WLM component in turn marks the server unavailable and
usable.interval function is used to specify exactly for how long.
InfoCenter makes the following recommendation for J2EE client requests
hangs and can NOT be recovered for disconnected cluster resource:
-CCDcom.ibm.CORBA.RequestTimeout=10
-CCDcom.ibm.websphere.wlm.Unusable.Interval=180
via java -D command line invocation, launchclient.sh, sas.client.props
etc...
Question: If TCP connection between client and server has been
disconnected already, the trial request does or does not have to wait for
RequestTimeout to complete, and switches immediately to one of the
available servers?
Answer: This depends upon the ORB. If the ORB knows about the bad
connection, it cleans up the connection and the request switches to one of
the servers (either available or the bad server according to the round
robin weight distribute algorithm). If the ORB doesn't know about the bad
connection, then the request is routed to that bad server and wait for
RequestTimeOut again.
RequestTimeout parameter is basically used to set a timeout value which
is used to wait for a response to the method invocation. If the response
is not received within this time period the ORB will throw a COMM_FAILURE
exception with details.
There is a min/max value for the unusable interval: 0 and ng.MAX_VALUE By
setting the value to "0", you effectively never will mark a server as
unusable. This is very dangerous, as if a server/node were to be stalled,
the clients would still try to route requests to this server until it was
brought back up, and these requests would take the request timeout value
to fail. WLM would/should then transparently route the request to another
appserver in the cluster. By setting this value to some very large value,
once the server is marked bad, clients will avoid this server until that
very large time as expired.
*Confirmation expectation from code review, reveals that a value of 0
*should* effectively disable the unusable.interval. It is NOT recommended
this be done: that would mean that WLM code would always try to go back to
a failed server, which would then have to wait the ORB request timeout.
Additionally, increasing timeout parameters can be more accommodating
based upon the situation, but can degrade performance also.
*WLM does not close/destroy any connections if receiving a communications
error. |
|
|
|