Workload Management component troubleshooting tips

If the Workload Management component is not properly distributing the workload across servers in multi-node configuration, use these steps to isolate the problem.

There are some basic steps for troubleshooting the Workload Management component:

Eliminate environment or configuration issues

First, determine the health of the cluster. In other words, are the servers capable of serving the applications for which they have been enabled? To do this, you must identify the cluster that is exhibiting the problem.

If you are experiencing workload management problems related to HTTP requests, such as HTTP requests not being served by all members of the cluster, be aware that the HTTP plug-in will balance the load across all servers that are defined in the PrimaryServers list if affinity has not been established. If you do not have a PrimaryServers list defined then the plug-in will load balance across all servers defined in the cluster if affinity has not been established. If affinity has been established, the plug-in should go directly to that server for all requests.

For workload management problems relating to enterprise bean requests, such as enterprise bean requests not getting served by all members of a cluster:

Note: The remainder of this article deals with enterprise bean workload balancing only. For more help on diagnosing problems in distributing Web (HTTP) requests, view the topics HTTP plug-in component troubleshooting tips and Web resource (JSP, servlet, html file, image, etc) will not display.

Browse log files for WLM errors and CORBA minor codes

If you still encounter problems with enterprise bean workload management, the next step is to check the activity log for entries that show:

To do this, use the Log Analyzer tool to open the service log (activity.log) on the affected servers, and look for the following entries:

If any of these warning are encountered, follow the user response given in the log. If, after following the user response, the warnings persist, look at any other errors and warnings in the Log Analyzer on the affected servers to look for:

You may also see exceptions with "CORBA" as part of the exception name, since WLM uses CORBA (Common Object Request Broker Architecture) to communicate between processes. Look for a statement in the exception stack specifying a "minor code". These codes denote the specific reason a CORBA call or response could not complete. WLM minor codes fall in range of 0x4921040 - 0x492104F. For an explanation of minor codes related to WLM, see the Javadoc for the package and class com.ibm.websphere.wlm.WsCorbaMinorCodes.

Analyze PMI data

The purpose for analyzing the PMI data is to understand the workload arriving for each member of a cluster. The data for any one member of the cluster is only useful within the context of the data of all the members of the cluster. To obtain PMI data for all members of a cluster, see Performance monitoring infrastructure.

Once you have obtained the PMI data, you should calculate the percentage of numIncomingRequests for each member of the cluster to the total of the numIncomingRequests of all members of the cluster. A comparison of this percentage value to the percentage of weights directed to each member of the cluster provides an initial look at the balance of the workload directed to each member of a cluster.

In addition to the numIncomingRequests two other metrics show how work is balanced between the members of a cluster, numincomingStrongAffinityRequests and numIncomingNonWLMObjectRequests. These two metrics show the number of requests directed to a specific member of a cluster that could only be serviced by that member.

For example, consider a 3-server cluster. We have assigned the following weights to each of these three servers:

Allow our cluster of servers to start servicing requests, and wait for the system to reach a steady state, that is the number of incoming requests to the cluster equals the number of responses from the servers. In such a situation, we would expect that the percentage of requests routed to each server to be:

Now let us consider a case where there are no incoming requests with neither strong affinity nor any non-WLM object requests.

In this scenario, let us assume that the PMI metrics gathered show the number of incoming requests for each server are:

Thus, the total number of requests coming into the cluster is: numIncomingRequestsCluster = numIncomingRequestsServer1 + numIncomingRequestsServer2 + numIncomingRequestsServer3 = 784

numincomingStrongAffinityRequests = 0

numIncomingNonWLMObjectRequests = 0

Can we decide based on this data if WLM is properly balancing the incoming requests among the servers in our cluster? Since there are no requests with strong affinity, the question we need to answer is, are the requests in the ratios we expect based on the assigned weights? The computation to answer that question is straightforward:

So WLM is behaving as designed, as the data are completely what is expected, based on the weights assigned the servers.

Now let us consider a 3-server cluster. We have assigned the following weights to each of these three servers:

Allow this cluster of servers to start servicing requests and wait for the system to reach a steady state, that is the number of incoming requests to the cluster equals the number of responses from the servers. In such a situation, we would expect that the percentage of requests routed to Server1-3 would be:

In this scenario, let us assume that the PMI metrics gathered show the number of incoming requests for each server are:

Thus, the total number of requests coming into the cluster:

In this case, we see that the number of requests was not evenly split among the three servers, as expected. Instead, the distribution is:

However, the correct interpretation of this data is the routing of requests is not perfectly balanced because Server1 had several hundred strong affinity requests. WLM attempts to compensate for strong affinity requests directed to 1 or more servers by distributing new incoming requests preferentially to servers which are not participating in transactional affinity, to compensate for those servers that are participating in transactions. In the case of incoming requests with strong affinity and non-WLM object requests, the analysis would be analogous to this case.

If, once you have analyzed the PMI data and accounted for transactional affinity and non-WLM object requests, the percentage of actual incoming requests to servers in a cluster to do not reflect the assigned weights, this indicates that requests are not being properly balanced. If this is the case, it is recommended that you repeat the steps described above for eliminating environment and configuration issues and browsing log files before proceeding.

Resolve problem or contact IBM support

If the PMI data or client logs indicate an error in WLM, collect the following information and contact IBM support.

If none of these steps solves the problem, check to see if the problem has been identified and documented using the links in Diagnosing and fixing problems: Resources for learning. If you do not see a problem that resembles yours, or if the information provided does not solve your problem, contact IBM support for further assistance.

For current information available from IBM Support on known problems and their resolution, see the IBM Support page.

IBM Support has documents that can save you time gathering information needed to resolve this problem. Before opening a PMR, see the IBM Support page.


Related tasks
Troubleshooting by task: What are you trying to do?
Related reference
Troubleshooting installation problems



Searchable topic ID:   rtrb_wlmcomp
Last updated: Jun 21, 2007 4:55:42 PM CDT    WebSphere Application Server Network Deployment, Version 5.0.2
http://publib.boulder.ibm.com/infocenter/wasinfo/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/rtrb_wlmcomp.html

Library | Support | Terms of Use | Feedback