WLM user.interval and CORBA request.timeout
 Technote (troubleshooting)
 
Problem(Abstract)
How do WLM user.interval and CORBA request.timeout settings work together and can WLM user.interval effectively be Disabled?
 
Cause
Corba Load Balancing outside default spec
 
Resolving the problem
WebSphere® EJB communications is managed transparently through ORB and WLM components. Related parameters can be tuned in accordance to environment and tested. It is also suggested and recommended to download and read "IBM WebSphere V5.1 Performance, Scalability, and High Availability WebSphere Handbook Series"
http://publib-b.boulder.ibm.com/abstracts/sg246198.html?Open for further consideration and detail.

The CORBA.requestTimeout can be adjusted for performance gain in heavy load scenarios for resource servers to provide enough time for the request to complete and CORBA.requestRetryCount can be coupled for completion of this task. The WLM component in turn marks the server unavailable and usable.interval function is used to specify exactly for how long.

InfoCenter makes the following recommendation for J2EE client requests hangs and can NOT be recovered for disconnected cluster resource:

-CCDcom.ibm.CORBA.RequestTimeout=10
-CCDcom.ibm.websphere.wlm.Unusable.Interval=180

via java -D command line invocation, launchclient.sh, sas.client.props etc...

Question: If TCP connection between client and server has been disconnected already, the trial request does or does not have to wait for RequestTimeout to complete, and switches immediately to one of the available servers?

Answer: This depends upon the ORB. If the ORB knows about the bad connection, it cleans up the connection and the request switches to one of the servers (either available or the bad server according to the round robin weight distribute algorithm). If the ORB doesn't know about the bad connection, then the request is routed to that bad server and wait for RequestTimeOut again.

RequestTimeout parameter is basically used to set a timeout value which is used to wait for a response to the method invocation. If the response is not received within this time period the ORB will throw a COMM_FAILURE exception with details.

There is a min/max value for the unusable interval: 0 and ng.MAX_VALUE By setting the value to "0", you effectively never will mark a server as unusable. This is very dangerous, as if a server/node were to be stalled, the clients would still try to route requests to this server until it was brought back up, and these requests would take the request timeout value to fail. WLM would/should then transparently route the request to another appserver in the cluster. By setting this value to some very large value, once the server is marked bad, clients will avoid this server until that very large time as expired.

*Confirmation expectation from code review, reveals that a value of 0 *should* effectively disable the unusable.interval. It is NOT recommended this be done: that would mean that WLM code would always try to go back to a failed server, which would then have to wait the ORB request timeout. Additionally, increasing timeout parameters can be more accommodating based upon the situation, but can degrade performance also.

*WLM does not close/destroy any connections if receiving a communications error.
 
Related information
J2EE client request hangs and cannot be recovered
 
 
Cross Reference information
Segment Product Component Platform Version Edition
Application Servers Runtimes for Java Technology Java SDK
 
 


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > Workload Management (WLM)
Operating system(s): Windows
Software version: 5.0
Software edition:
Reference #: 1191220
IBM Group: Software Group
Modified date: Mar 27, 2006