This information might help you diagnose the trouble if you are
having a workload distribution problem.
What kind of problem are you seeing?
If none of these problem solution descriptions fix your problem:
- Browse the JVM logs of the problem deployment manager and
application servers:
- Look up any error messages by selecting the Reference view of the
information center navigation and expanding Messages in the navigation
tree.
If Java™ exceptions appear in the log files, try to determine
the actual subcomponent that is directly involved in the problem by examining
the trace stack and looking for a product-related class near the top of the
stack (names beginning with com.ibm.websphere or com.ibm.ws)
that created the exception. If appropriate, review the steps for troubleshooting
the appropriate subcomponent under the Troubleshooting WebSphere applications section
of the Information Center.
For example, if the exception appears to
have been thrown by a class in the com.ibm.websphere.naming package, review
the Naming Services Component troubleshooting tips topic.
- Ensure that all the machines in your configuration have TCP/IP connectivity
to each other by running the ping command:
- From each physical server to the deployment manager
- From the deployment manager to each physical server
- Although the problem is happening in a clustered environment, the actual
cause might be only indirectly related, or unrelated, to clustering. Investigate
all relevant possibilities:
- If an enterprise bean on one or more servers is not serving requests,
review the "Cannot access an enterprise bean from a servlet, JSP, stand-alone
program, or other client" and "Cannot look up an object hosted by the product
from a servlet, JSP file, or other client" topics.
- If problems seem to appear after enabling security, review the "Errors
or access problems after enabling security" topic.
- If an application server stops responding to requests, or spontaneously
dies (its process closes), review the "Web module or application server dies
or hangs" topic.
- If SOAP requests are not being served by some or all servers, review the
"Errors returned to client trying to send a SOAP request" topic.
- Check to see if the problem is identified and documented by looking at
available online support (hints and tips, technotes, and fixes).
HTTP requests are not distributed to all servers
If
HTTP requests are not being distributed to all servers:
- Check your Primary Servers list. The plug-in load balances across all
servers that are defined in the Primary Servers list, if affinity has not
been established. If you do not have a Primary Servers list defined, the plug-in
load balances across all servers defined in the cluster, if affinity has not
been established. In the case where affinity has been established, the plug-in
should go directly to that server, for all requests within the same HTTP session.
- If some servers are servicing requests and one or more others are not,
try accessing a problem server directly to verify that it works, apart from
workload management issues. If that does not work:
- Use the administrative console to ensure that the affected server is running.
- See the topic "Web resource does not display" for more information.
- See the "HTTP plug-in component troubleshooting tips" topic for more information.
Enterprise bean requests are not distributed to
all servers
If a client cannot reach a server in a cluster thought
to be reachable, a server might be marked unusable, or is down. To verify
this:
- Use the administrative console to verify that the server is started. Try
starting it, or if started, stop and restart it.
- Browse the administrative console and verify that the node that runs the
server having the problem appears. If it does not:
- Review the steps for adding a node to a cluster.
- Review the steps in the section One or more nodes do not show up
in the administrative console.
- If possible, try accessing the enterprise bean directly on the problem
server to see if there is a problem with TCP/IP connectivity, application
server health, or other problem not related to workload management. If this
fails, review the "Cannot access enterprise bean from a servlet, JSP, stand-alone
program , or other client" topic.
A failing server still receives enterprise bean
requests (failover is not completed)
Some possible causes of this
problem are:
- The client might have been in a transaction with an enterprise
bean on the server that went down. Check the JVM logs of the application server
hosting the problem enterprise bean instance. If a request is returned with CORBA
SystemException COMM_FAILURE org.omg.CORBA.completion_status.COMPLETED_MAYBE,
this might be working as designed. The design is to let this particular exception
flow back to the client, since the transaction might have completed. Failing
over this request to another server could result in this request being serviced
twice.
- If the requests sent to the servers come back to the client
with any other exceptions consistently, it might be that no servers are available.
Stopped or hung servers do not share
the workload after being restored
This error occurs when previously
unavailable servers are not recognized by the workload management component
after those servers are restored. There is an unusable interval
determined by the property com.ibm.websphere.wlm.unusable.interval during
which the workload manager waits to send to a server that has been marked
unusable. By default this is 5 minutes.
You can confirm
that this is the problem by ensuring that servers that were down are now up
and capable of servicing requests. Then wait for the unusable interval to
elapse before checking to determine whether failover occurs.