This information might help you diagnose the trouble if
you are having a workload distribution problem.
Note: This topic references one or more of the application
server log files. As a recommended alternative, you can configure
the server to use the High Performance Extensible Logging (HPEL) log
and trace infrastructure instead of using SystemOut.log , SystemErr.log, trace.log, and activity.log files on distributed and IBM® i systems. You can also use
HPEL in conjunction with your native z/OS® logging facilities. If you are using HPEL, you can access
all of your log and trace information using the LogViewer command-line
tool from your server profile bin directory. See the information
about using HPEL to troubleshoot applications for more information
on using HPEL.
What kind of problem are you seeing?
If none of these problem solution descriptions fix your problem:
![[AIX Solaris HP-UX Linux Windows]](../images/dist.gif)
Browse the JVM logs of the problem deployment
manager and application servers:- Look up any error messages by selecting the Reference view
of the information center navigation and expanding Messages in
the navigation tree.
![[AIX Solaris HP-UX Linux Windows]](../images/dist.gif)
Use the Log and Trace Analyzer toolt o
browse and analyze the service log (activity.log)
of the deployment manager and any nodes encountering problems. View
the activity.log files in both app_server_root/logs and
app_server_root/logs.
Analyze the service log (activity.log)
of the deployment manager and any nodes encountering problems. View
the activity.log files in profile_root/logs.
If Java exceptions appear in the log
files, try to determine the actual subcomponent that is directly involved
in the problem by examining the trace stack and looking for a product-related
class near the top of the stack (names beginning with com.ibm.websphere
or com.ibm.ws) that created the exception. If appropriate, review
the steps for troubleshooting the appropriate subcomponent under the Troubleshooting
WebSphere applications section of the Information Center.
For
example, if the exception appears to have been thrown by a class in
the com.ibm.websphere.naming package, review the "Naming Services
Component troubleshooting tips" topic.
Browse the JVM logs of the problem deployment manager
and application servers:- Look up any error messages by selecting the Reference view
of the information center navigation and expanding Messages in
the navigation tree.
If Java exceptions appear in the log
files, try to determine the actual subcomponent that is directly involved
in the problem by examining the trace stack and looking for a product-related
class near the top of the stack (names beginning with com.ibm.websphere or com.ibm.ws)
that created the exception. If appropriate, review the steps for troubleshooting
the appropriate subcomponent under the Troubleshooting WebSphere
applications section of the Information Center.
For example,
if the exception appears to have been thrown by a class in the com.ibm.websphere.naming
package, review the Naming Services Component troubleshooting tips
topic.
- Ensure that all the machines in your configuration have TCP/IP
connectivity to each other by running the ping command:
- From each physical server to the deployment manager
- From the deployment manager to each physical server
- Although the problem is happening in a clustered environment,
the actual cause might be only indirectly related, or unrelated, to
clustering. Investigate all relevant possibilities:
- If an enterprise bean on one or more servers is not serving requests,
review the "Cannot access an enterprise bean from a servlet, JSP,
stand-alone program, or other client" and "Cannot look up an object
hosted by the product from a servlet, JSP file, or other client" topics.
- If problems seem to appear after enabling security, review the
"Errors or access problems after enabling security" topic.
- If an application server stops responding to requests, or spontaneously
dies (its process closes), review the "Web module or application server
dies or hangs" topic.
- If SOAP requests are not being served by some or all servers,
review the "Errors returned to client trying to send a SOAP request"
topic.
![[AIX Solaris HP-UX Linux Windows]](../images/dist.gif)
If you have problems installing or deploying
an application on servers on one or more nodes, review the "Troubleshooting
code deployment and installation problems" topic.
![[AIX Solaris HP-UX Linux Windows]](../images/dist.gif)
If your topology consists of a Windows-based
deployment manager with supported UNIX systems
servers, browse any recently-updated .xml and .policy files
on the supported UNIX-based systems using vi to
ensure that Control-M characters are not present in the files. To
avoid this problem in the future, edit these files using vi on
the supported UNIX-based systems, to avoid inserting these
characters.
![[AIX Solaris HP-UX Linux Windows]](../images/dist.gif)
Check for troubleshooting tips for the
workload management component.
- Check to see if the problem is identified and documented by looking
at available online support (hints and tips, technotes, and fixes).
HTTP requests are not distributed to
all servers
If HTTP requests are not being distributed to
all servers:
- Check your Primary Servers list. The plug-in load balances across
all servers that are defined in the Primary Servers list, if affinity
has not been established. If you do not have a Primary Servers list
defined, the plug-in load balances across all servers defined in the
cluster, if affinity has not been established. In the case where affinity
has been established, the plug-in should go directly to that server,
for all requests within the same HTTP session.
- If some servers are servicing requests and one or more others
are not, try accessing a problem server directly to verify that it
works, apart from workload management issues. If that does not work:
- Use the administrative console to ensure that the affected server
is running.
- See the topic "Web resource does not display" for more information.
- See the "HTTP plug-in component troubleshooting tips" topic for
more information.
![[AIX Solaris HP-UX Linux Windows]](../images/dist.gif)
Check the steps for diagnosing workload
management issues in the "Troubleshooting the Workload Management
component" topic.
Enterprise bean requests are not distributed
to all servers
If a client cannot reach a server in a cluster
thought to be reachable, a server might be marked unusable, or is
down. To verify this:
- Use the administrative console to verify that the server is started.
Try starting it, or if started, stop and restart it.
- Browse the administrative console and verify that the node that
runs the server having the problem appears. If it does not:
- Review the steps for adding a node to a cluster.
- Review the steps in the section One or more nodes do not
show up in the administrative console.
- If possible, try accessing the enterprise bean directly on the
problem server to see if there is a problem with TCP/IP connectivity,
application server health, or other problem not related to workload
management. If this fails, review the "Cannot access enterprise bean
from a servlet, JSP, stand-alone program , or other client" topic.
![[AIX Solaris HP-UX Linux Windows]](../images/dist.gif)
Check the steps for diagnosing workload
management issues in the "Troubleshooting the Workload Management
component" topic.
A failing server still receives enterprise
bean requests (failover is not completed)
Stopped or hung servers do
not share the workload after being restored
This error occurs
when previously unavailable servers are not recognized by the workload
management component after those servers are restored. There is an unusable interval
determined by the property com.ibm.websphere.wlm.unusable.interval
during which the workload manager waits to send to a server that has
been marked unusable. By default this is 5 minutes.
You can
confirm that this is the problem by ensuring that servers that were
down are now up and capable of servicing requests. Then wait for the
unusable interval to elapse before checking to determine whether failover
occurs.
![[AIX Solaris HP-UX Linux Windows]](../images/dist.gif)
A cluster
does not fail over to its backup cluster
You might experience
an error that is similar to the following sample:
[10/11/04 13:11:10:233 CDT] 00000036 SelectionMana A WWLM0061W: An error was
encountered sending a request to cluster member {MEMBERNAME=FlorenceEJBServer1,
NODENAME=fwwsaix1Node01} and that member has been marked unusable for future
requests to the cluster "", because of exception: org.omg.CORBA.COMM_FAILURE:
CONNECT_FAILURE_ON_SSL_CLIENT_SOCKET - JSSL0130E: java.io.IOException: Signals
that an I/O exception of some sort has occurred. Reason: Connection refused
vmcid: 0x49421000 minor code: 70 completed: No"
Perform
the following steps to fix your configuration:
- Review your deployment manager hostname and bootstrap port for
each backup cluster setting.
- Review your core group bridge peer ports to make sure the hostname
and distribution and consistency services (DCS) port are accurate.
- Verify that the names of your primary and backup clusters match.
- If your application is going through security to go to the backup
cluster, review your security configuration. You might need to use
single sign on (SSO) and import the Lightweight Third Party Authentication
(LTPA) keys to the backup cell.