This chapter helps you detect and resolve problems associated with Load Balancer.
Use the information in this section to gather the data that IBM service requires. The information is divided into the following subjects.
For the Dispatcher component only, there is a problem determination tool that automatically gathers operating system-specific data and component-specified configuration files. To run this tool, type lbpd from the appropriate directory:
This problem determination tool packages the data into files as follows:
Before you call IBM service, have the following information available.
dscontrol file save primary.cfg
This command places the configuration file in the following directory:
java -fullversion
netstat -ni
ipconfig /all
This is required from all servers and Load Balancer.
netstat -nr
route print
This is required from all servers and Load Balancer.
Gather the following required information for problems in an HA environment.
The script names are:
Also include the configuration files. See General information (always required).
Gather the following required information for advisor problems; for example, when advisors are mistakenly marking servers as down.
dscontrol advisor loglevel http 80 5
or
dscontrol advisor loglevel advisorName port loglevel
or
dscontrol advisor loglevel advisorName cluster:port loglevel
or
nalcontrol metriccollector set consultantID:serviceID:metricName
loglevel value
This creates a log named ADV_advisorName log; for example, ADV_http.log. This log is located as follows:
Where component is:
The ADVLOG call prints statements to the advisors log file when the level is less than the logging level associated with the advisors. A logging level of 0 will cause the statement to always be written. You cannot use ADVLOG from the constructor. The log file is not created until immediately after the custom advisor's constructor has completed because the log file name depends on information that is set in the constructor.
There is another way to debug your custom advisor that will avoid this limitation. You can use System.out.println(message) statements to print messages to a window. Edit the dsserver script and change javaw to java for the print statements to appear in the window. The window used to start dsserver must be kept open for the prints to appear. If you are using Windows platforms, you must stop the Dispatcher from running as a service and manually start it from a window to see the messages.
Refer to Programming Guide for Edge Components for more information on ADVLOG.
Gather the following required information for Content Based Routing problems.
If you are not able to hit the cluster, it is possible that neither or both of the Load Balancer machines have the cluster aliased. To determine which machine owns the cluster:
ping cluster
arp -a
If you are using Dispatcher's nat or cbr
forwarding methods, ping the return address also.If you do not get a response from the ping, and you are not using ULB, it is possible that neither machine has the cluster IP address aliased to its interface; for example, en0, tr0, and so forth.
If you are unable to solve routing problems and all else has failed, issue the following command to run a trace on the network traffic:
iptrace -a -s failingClientIPAddress -d clusterIPAddress -b iptrace.trc
Run
the trace, recreate the problem, then kill the process.tcpdump -i lan0 host cluster and host client
You
may need to download tcpdump from one of the HP-UX GNU software archive
sites.tcpdump -i eth0 host cluster and host client
Run
the trace, recreate the problem, then kill the process.snoop -v clientIPAddress destinationIPAddress > snooptrace.out
You can also increase different log levels (for example, manager log, advisor log and so forth.) and investigate their output.
To identify a problem that is already fixed in a service release fix or patch, check for upgrades. To obtain a list of Edge Components defects fixed, refer to the WebSphere® Application Server Web site Support page: http://www.ibm.com/software/webservers/appserv/was/support/. From the Support page, follow the link to the corrective service download site.
The correct version of Java code is installed as part the Load Balancer installation.
See Reference Information for links to support and library Web pages. The Web support page contains a link to Self-help information in the form of Technotes.
Refer to the following for:
Symptom | Possible Cause | Go to... |
---|---|---|
Dispatcher not running correctly | Conflicting port numbers | Checking Dispatcher port numbers |
Configured a collocated server and it will not respond to load balanced requests | Wrong or conflicting address | Problem: Dispatcher and server will not respond |
Connections from client machines not being served or connections timing out |
|
Problem: Dispatcher requests are not being balanced |
Client machines are not being served or are timing out | High availability not working | Problem: Dispatcher high-availability function is not working |
Unable to add heartbeat (Windows platform) | Source address is not configured on an adapter | Problem: Unable to add heartbeat (Windows platform) |
Advisors not working correctly with wide area | Advisors are not running on remote machines | Problem: Advisors not working correctly |
On a backend server running Windows Server 2008, memload.exe crashes | The Windows Server 2008 registry might not be populated with the performance keys that these tools require. This application crash would be reported from the cpuload application. | Problem: On a Windows Server 2008 backend server, memload.exe crashes |
Dispatcher, Microsoft IIS, and SSL are not working or will not continue | Unable to send encrypted data across protocols | Problem: Dispatcher, Microsoft IIS, and SSL do not work (Windows platform) |
Connection to remote machine refused | Older version of the keys is still being used | Problem: Dispatcher connection to a remote machine |
The dscontrol or lbadmin command fails with ‘Server not responding' or ‘unable to access RMI server' message |
|
Problem: dscontrol or lbadmin command fails |
"Cannot Find the File..." error message, when running Netscape as default browser to view online help (Windows platform) | Incorrect setting for HTML file association | Problem: "Cannot find the file..." error message when trying to view online Help (Windows platform) |
Graphical user interface does not start correctly | Insufficient paging space | Problem: Graphical user interface (GUI) does not start correctly |
Error running Dispatcher with Caching Proxy installed | Caching Proxy file dependency | Problem: Error running Dispatcher with Caching Proxy installed |
Graphical user interface does not display correctly. | Resolution is incorrect. | Problem: Graphical user interface (GUI) does not display correctly |
Help panels sometimes disappear behind other windows | Java limitation | Problem: On Windows platform, help windows sometimes disappear behind other open windows |
Load Balancer cannot process and forward a frame | Need a unique MAC address for each NIC | Problem: Load Balancer cannot process and forward a frame |
Blue screen appears | No installed and configured network card | Problem: A blue screen displays when you start the Load Balancer executor |
Path to Discovery prevents return traffic | The cluster is aliased on the loopback | Problem: Path to Discovery prevents return traffic with Load Balancer |
High availability in the Wide Area mode of Load Balancer does not work. | Remote Dispatcher must be defined as a server in a cluster on local Dispatcher | Problem: High availability in the Wide Area mode of Load Balancer does not work |
GUI hangs (or unexpected behavior) when trying to load a large configuration file. | Java does not have access to enough memory to handle such a large change to the GUI | Problem: GUI hangs (or unexpected behavior) when trying to load a large configuration file |
IP addresses not resolving correctly over the remote connection | When using a remote client over a secure socks implementation, fully qualified domain names or host names might not resolve to the correct IP address | Problem: IP addresses not resolving correctly over the remote connection |
Korean Load Balancer interface displays overlapping or undesirable fonts on AIX and Linux systems | Default fonts must be changed | Problem: Korean Load Balancer interface displays overlapping or undesirable fonts on AIX and Linux systems |
On Windows systems, after aliasing the MS Loopback adapter, when issuing certain commands such as hostname, the OS will incorrectly respond with the alias address | In the network connections list, the newly added alias must not be listed above the local address | Problem: On Windows systems, alias address is returned instead of local address when issuing commands such as hostname |
Unexpected GUI behavior when using Windows platform paired with Matrox AGP video card | Problem occurs when using Matrox AGP video cards while running the Load Balancer GUI | Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards |
Unexpected behavior, such as system hang, when executing "rmmod ibmlb" on Linux systems | Problem occurs when manually removing the Load Balancer kernel module (ibmlb). | Problem: Unexpected behavior when executing "rmmod ibmlb" (Linux systems) |
Slow response time when running commands on the Dispatcher machine | Slow response time can be due to machine overloading from a high volume of client traffic | Problem: Slow response time running commands on Dispatcher machine |
For Dispatcher's mac forwarding method, SSL or HTTPS advisor not registering server loads | Problem occurs because the SSL server application not configured with the cluster IP address | Problem: SSL or HTTPS advisor not registering server loads (when using mac-forwarding) |
Disconnect from host when using remote Web administration through Netscape | Disconnect from host will occur when resize the browser window | Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration |
On Windows platform, corrupted Latin-1 national characters appear in command prompt | Change font properties of command prompt window | Problem: On Windows systems, corrupted Latin-1 national characters appear in command prompt window |
On HP-UX platform, the following message occurs: java.lang.OutOfMemoryError unable to create new native thread | Some HP-UX installations by default allow 64 threads per process. This is insufficient. | Problem: On HP-UX, Java out of memory or thread error occurs |
On Windows platform, advisors and reach targets mark all servers down | Task offloading is not disabled or may need to enable ICMP. | Problem: On Windows systems, advisors and reach targets mark all servers down |
On Windows platform, problem resolving IP address to hostname when more than one address is configured to an adapter | The IP address you want as your hostname must appear first in the registry. | Problem: On Windows platform, resolving IP address to host name when more than one address is configured to an adapter |
On Windows platform, advisors not working in a high availability setup after a network outage | When the system detects a network outage, it clears its Address Resolution Protocol (ARP) cache | Problem: On Windows systems, after network outage, advisors not working in a high availability setup |
On Linux systems, "IP address add" command and multiple cluster loopback aliases are incompatible | When aliasing more than one address on the loopback device, should use ifconfig command, not ip address add | Problem: On Linux systems, do not use "IP address add" command when aliasing multiple clusters on the loopback device |
Error message: "Router address not specified or not valid for port method" when trying to add a server | Checklist of information to determine the problem that has occurred when adding a server | Problem: "Router address not specified or not valid for port method" error message |
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started | Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. | Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started |
Slow down occurs when loading Load Balancer configurations | The delay might be due to Domain Name System (DNS) calls that are made to resolve and verify the server address. | Problem: Delay occurs while loading a Load Balancer configuration |
On Windows systems, the following error message appears: There is an IP address conflict with another system on the network | If high availability is configured, cluster addresses may be configured on both machines for a brief period which causes this error message to appear. | Problem: On Windows systems, an IP address conflict error message appears |
Both primary and backup machines are active in a high availability configuration | This problem may occur when the go scripts do not run on either primary or backup machine. | Problem: Both primary and backup machines are active in a high availability configuration |
Client requests fail when Dispatcher attempts to return large page responses | Client requests that result in large page responses timeout if the maximum transmit unit (MTU) is not set properly on the Dispatcher machine when using nat or cbr forwarding. | Problem: Client requests fail when attempting the return of large page responses |
On Windows systems, "Server not responding" error occurs when issuing a dscontrol or lbadmin command | When more than one IP address exists on a Windows system and the host file does not specify the address to associate with the hostname. | Problem: On Windows systems, "Server not responding" error occurs when issuing dscontrol or lbadmin |
High availability Dispatcher machines may fail to synchronize on Linux for S/390 on qeth devices | When using high availability on Linux for S/390 with the qeth network driver, the active and standby Dispatchers may fail to synchronize. | Problem: High availability Dispatcher machines may fail to synchronize on Linux for S/390 systems on qeth drivers |
Tips on configuring the high availability feature for Load Balancer | The tips will help alleviate high availability
problems such as:
|
Problem: Tips on configuring high availability |
Dispatcher MAC forwarding configuration limitations with zSeries and S/390 platforms | On Linux, there are limitations when using zSeries or S/390 servers that have Open System Adapter (OSA) cards. Possible workarounds are provided. | Problem: On Linux, Dispatcher configuration limitations when using zSeries or S/390 servers that have Open System Adapter (OSA) cards |
On some Red Hat Linux versions, a memory leak occurs when running Load Balancer configured with the manager and advisors | The IBM Java SDK versions of the JVM and the Native POSIX Thread Library (NPTL) shipped with some Linux distributions, such as Red Hat Enterprise Linux 3.0, can cause the memory leak to occur. | Problem: On some Linux versions, a memory leak occurs when running Dispatcher configured with the manager and advisors |
On SUSE Linux Enterprise Server 9, Dispatcher report indicates that packets are forwarded (packet-count increases), however packets never actually reach the backend server | The iptables NAT module is loaded. There is a possible, but unconfirmed, error in this version of iptables that causes strange behavior when interacting with Dispatcher. | Problem: On SUSE Linux Enterprise Server 9, Dispatcher forwards packets, but the packets do not reach the backend server |
On Windows systems, when using Dispatcher's high availability feature, problems might occur during takeover | If the goScript that configures the cluster IP address on the active machine runs before the goScript to unconfigure the IP cluster address on the backup machine, problems might occur. | Problem: On Windows system, IP address conflict message appears during high availability takeover |
On Linux systems, iptables can interfere with the routing of packets | Linux iptables can interfere with load balancing of traffic and must be disabled on the Load Balancer machine. | Problem: Linux iptables can interfere with the routing of packets |
A Java fileset warning message appears when installing service fixes or installing natively, using system packaging tools | The product installation consists of several packages which are not required to be installed on the same machine, so each of these packages installs a Java fileset. When installed on the same machine a warning messages stating that the Java fileset is also owned by another fileset. | |
Upgrading the Java fileset provided with the Load Balancer installations | If a problem is found with the Java file set, you should report the problem to IBM Service so that you can receive an upgrade for the Java file set that was provided with the Load Balancer installation. | Upgrading the Java file set provided with the Load Balancer installation |
Persistent connections might drop during high availability takeover on a Windows platform | On Microsoft Windows operating systems, persistent connections might drop during a high availability takeover. This problem exists only when you have a collocated server that uses the MAC forwarding method. | Problem: Persistent connections might drop during high availability takeover |
Symptom | Possible Cause | Go to... |
CBR not running correctly | Conflicting port numbers | Checking CBR port numbers |
The cbrcontrol or lbadmin command fails with ‘Server not responding' or ‘unable to access RMI server' message | Commands fail due to socksified stack. Or commands fail due to not starting cbrserver | Problem: cbrcontrol or lbadmin command fails |
Requests are not being load balanced | Caching Proxy was started before the executor was started | Problem: Requests not being load balanced |
On Solaris, the cbrcontrol executor start command fails with ‘Error: Executor was not started.' message | Command fails because the system IPC defaults may need to be modified, or link to library is incorrect. | Problem: On Solaris systems, cbrcontrol executor start command fails |
URL rule does not work | Syntactical or configuration error | Problem: Syntactical or configuration error |
Unexpected GUI behavior when using Windows systems paired with Matrox AGP video card | Problem occurs when using Matrox AGP video cards while running the Load Balancer GUI | Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards |
GUI hangs (or unexpected behavior) when trying to load a large configuration file. | Java does not have access to enough memory to handle such a large change to the GUI | Problem: GUI hangs (or unexpected behavior) when trying to load a large configuration file |
Disconnect from host when using remote Web administration through Netscape | Disconnect from host will occur when resize the browser window | Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration |
On Windows platform, corrupted Latin-1 national characters appear in command prompt | Change font properties of command prompt window | Problem: On Windows platform, corrupted Latin-1 national characters appear in command prompt window |
On HP-UX platform, the following message occurs: java.lang.OutOfMemoryError unable to create new native thread | Some HP-UX installations by default allow 64 threads per process. This is insufficient. | Problem: On HP-UX, Java out of memory/ thread error occurs |
On Windows platform, advisors and reach targets mark all servers down | Task offloading is not disabled or may need to enable icmp. | Problem: On Windows systems, advisors and reach targets mark all servers down |
On Windows platform, problem resolving IP address to host name when more than one address is configured to an adapter | The IP address you want as your hostname must appear first in the registry. | Problem: On Windows systems, resolving IP address to host name when more than one address is configured to an adapter |
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started | Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. | Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started |
Symptom | Possible Cause | Go to... |
---|---|---|
Site Selector not running correctly | Conflicting port number | Checking Site Selector port numbers |
Site Selector does not round-robin incoming requests from Solaris client | Solaris systems run a "name service cache daemon" | Problem: Site Selector does not round-robin traffic from Solaris clients |
The sscontrol or lbadmin command fails with ‘Server not responding' or ‘unable to access RMI server' message | Commands fail due to socksified stack. Or commands fail due to not starting ssserver. | Problem: sscontrol or lbadmin command fails |
ssserver fails to start on Windows platform | Windows systems do not require the host name to be in the DNS. | Problem: The ssserver is failing to start on Windows platform |
Machine with duplicate routes not load balancing correctly -- name resolution appears to fail | Site Selector machine with multiple adapters attached to the same subnet | Problem: Site Selector with duplicate routes not load balancing correctly |
Unexpected GUI behavior when using Windows platform paired with Matrox AGP video card | Problem occurs when using Matrox AGP video cards while running the Load Balancer GUI | Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards |
GUI hangs (or unexpected behavior) when trying to load a large configuration file. | Java does not have access to enough memory to handle such a large change to the GUI | Problem: GUI hangs (or unexpected behavior) when trying to load a large configuration file |
Disconnect from host when using remote Web administration through Netscape | Disconnect from host will occur when resize the browser window | Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration |
On Windows platform, corrupted Latin-1 national characters appear in command prompt | Change font properties of command prompt window | Problem: On Windows platform, corrupted Latin-1 national characters appear in command prompt window |
On HP-UX platform, the following message occurs: java.lang.OutOfMemoryError unable to create new native thread | Some HP-UX installations by default allow 64 threads per process. This is insufficient. | Problem: On HP-UX, Java out of memory/thread error occurs |
On Windows platform, advisors and reach targets mark all servers down | Task offloading is not disabled or may need to enable icmp. | Problem: On Windows systems, advisors and reach targets mark all servers down |
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started | Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. | Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started |
Symptom | Possible Cause | Go to... |
---|---|---|
ccoserver will not start | Conflicting port numbers | Checking Cisco CSS Controller port numbers |
The ccocontrol or lbadmin command fails with ‘Server not responding' or ‘unable to access RMI server' message | Commands fail due to socksified stack. Or commands fail due to not starting ccoserver. | Problem: ccocontrol or lbadmin command fails |
receive error: Cannot create registry on port 13099 | Expired product license | Problem: Cannot create registry on port 13099 |
Unexpected GUI behavior when using Windows platform paired with Matrox AGP video card | Problem occurs when using Matrox AGP video cards while running the Load Balancer GUI | Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards |
Received a connection error when adding a consultant | Configuration settings are incorrect on the switch or the controller | Problem: Received a connection error when adding a consultant |
Weights are not being updated on the switch | Communication between the controller or the switch is unavailable or interrupted | Problem: Weights are not being updated on the switch |
Refresh command did not update the consultant configuration | Communication between the switch and the controller is unavailable or interrupted | Problem: Refresh command did not update the consultant configuration |
GUI hangs (or unexpected behavior) when trying to load a large configuration file. | Java does not have access to enough memory to handle such a large change to the GUI | Problem: GUI hangs (or unexpected behavior) when trying to load a large configuration file |
Disconnect from host when using remote Web administration through Netscape | Disconnect from host will occur when resize the browser window | Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration |
On Windows platform, corrupted Latin-1 national characters appear in command prompt | Change font properties of command prompt window | Problem: On Windows platform, corrupted Latin-1 national characters appear in command prompt window |
On HP-UX platform, the following message occurs: java.lang.OutOfMemoryError unable to create new native thread | Some HP-UX installations by default allow 64 threads per process. This is insufficient. | Problem: On HP-UX, Java out of memory/ thread error occurs |
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started | Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. | Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started |
Symptom | Possible Cause | Go to... |
---|---|---|
nalserver will not start | Conflicting port numbers | Checking Nortel Alteon Controller port numbers |
The nalcontrol or lbadmin command fails with ‘Server not responding' or ‘unable to access RMI server' message | Commands fail due to socksified stack. Or commands fail due to not starting nalserver. | Problem: nalcontrol or lbadmin command fails |
receive error: Cannot create registry on port 14099 | Expired product license | Problem: Cannot create registry on port 14099 |
Unexpected GUI behavior when using Windows platform paired with Matrox AGP video card | Problem occurs when using Matrox AGP video cards while running the Load Balancer GUI | Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards |
GUI hangs (or unexpected behavior) when trying to load a large configuration file. | Java does not have access to enough memory to handle such a large change to the GUI | Problem: GUI hangs (or unexpected behavior) when trying to load a large configuration file |
Disconnect from host when using remote Web administration through Netscape | Disconnect from host will occur when resize the browser window | Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration |
Received a connection error when adding a consultant | Configuration settings are incorrect on the switch or the controller | Problem: Received a connection error when adding a consultant |
Weights are not being updated on the switch | Communication between the controller or the switch is unavailable or interrupted | Problem: Weights are not being updated on the switch |
Refresh command did not update the consultant configuration | Communication between the switch and the controller is unavailable or interrupted | Problem: Refresh command did not update the consultant configuration |
On Windows platform, corrupted Latin-1 national characters appear in command prompt | Change font properties of command prompt window | Problem: On Windows systems, corrupted Latin-1 national characters appear in command prompt window |
On HP-UX platform, the following message occurs: java.lang.OutOfMemoryError unable to create new native thread | Some HP-UX installations by default allow 64 threads per process. This is insufficient. | Problem: On HP-UX, Java out of memory/ thread error occurs |
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started | Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. | Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started |
Symptom | Possible Cause | Go to... |
---|---|---|
Metric Server IOException on Windows platform running .bat or .cmd user metric files | Full metric name is required | Problem: Metric Server IOException on Windows platform running .bat or .cmd user metric files |
Metric Server not reporting the load information to the Load Balancer machine | Possible causes include:
|
Problem: Metric Server not reporting loads to Load Balancer machine |
Metric Server log reports "Signature is necessary for access to agent" when key files transferred to server | Key file fails authorization due to corruption. | Problem: Metric Server log reports "Signature is necessary for access to agent" |
On AIX systems, when running Metric Server under heavy stress on a multi-processor system (AIX 5.1), ps -vg command output may become corrupted | APAR IY33804 corrects this known AIX problem | Problem: On AIX systems, while running Metric Server under heavy stress, ps -vg command output may become corrupted |
Configuring Metric Server in a two-tier configuration with Site Selector load-balancing across high-availability Dispatchers | Metric Server (residing in the second-tier) is not configured to listen on a new IP address. | Problem: Configuring Metric Server in a two-tier configuration with Site Selector load-balancing across high-availability Dispatchers |
Scripts (metricserver, cpuload, memload) running on multi-CPU Solaris machines produce unwanted console messages | This behavior is due to the use of the VMSTAT system command to gather CPU and memory statistics from the kernel. | Problem: Scripts, running on multi-CPU Solaris machines, produce unwanted console messages |
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started | Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. | Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started |
Metric value returns -1 after starting Metric Server | This problem may be caused due to the key files losing their integrity during the transfer of the key files to the client. | Problem: After starting Metric Server, metric value returns -1 |
If you are experiencing problems running Dispatcher, it may be that one of your applications is using a port number that the Dispatcher normally uses. Be aware that the Dispatcher server uses the following port numbers:
If another application is using one of the Dispatcher's port numbers, you can either change the Dispatcher's port numbers or change the application's port number.
Change the Dispatcher’s port numbers by doing the following:
Change the application's RMI port number by doing the following:
If you are experiencing problems running CBR, it may be that one of your applications is using a port number that CBR normally uses. Be aware that CBR uses the following port number:
If another application is using one of the CBR's port numbers, you can either change the CBR's port numbers or change the application's port number.
Change the CBR’s port numbers by doing the following:
Change the application's RMI port number by doing the following:
If you are experiencing problems running the Site Selector component, it may be that one of your applications is using a port number that Site Selector normally uses. Be aware that Site Selector uses the following port numbers:
If another application is using one of the Site Selector's port numbers, you can either change the Site Selector’s port numbers or change the application's port number.
Change the Site Selector's port numbers by doing the following:
Change the application's RMI port number by doing the following:
If you are experiencing problems running the Cisco CSS Controller component, it may be that another application is using one of the port numbers used by Cisco CSS Controller's ccoserver. Be aware that Cisco CSS Controller uses the following port numbers:
If another application is using one of the Cisco CSS Controller's port numbers, you can either change the port numbers for Cisco CSS Controller or change the application's port number.
Change the Cisco CSS Controller's port numbers by doing the following:
Change the application's RMI port number by doing the following:
If you are experiencing problems running the Nortel Alteon Controller component, it may be that another application is using one of the port numbers used by Nortel Alteon Controller's nalserver. Be aware that Nortel Alteon Controller uses the following port numbers:
If another application is using one of the Nortel Alteon Controller's port numbers, you can either change the port numbers for Nortel Alteon Controller or change the port numbers for the applicaton.
Change the port numbers for Nortel Alteon Controller by doing the following:
Change the application's RMI port number by doing the following:
This problem can occur when another application is using one of the ports used by the Dispatcher. For more information, go to Checking Dispatcher port numbers.
This problem occurs when another address is being used other than the address specified. When collocating the Dispatcher and server, be sure that the server address used in the configuration is the NFA address or is configured as collocated. Also, check the host file for the correct address.
This problem has symptoms such as connections from client machines not being served or connections timing out. Check the following to diagnose this problem:
For Windows and other platforms, see also Setting up server machines for load balancing.
This problem appears when a Dispatcher high-availability environment is configured and connections from the client machines are not being served or are timing out. Check the following to correct or diagnose the problem:
The following steps are an effective way to test that high availability scripts are functioning properly:
The two reports are identical if the scripts are properly configured.
This Windows platform error occurs when the source address is not configured on an adapter. Check the following to correct or diagnose the problem.
dscontrol executor configure <ip address>
If you are using wide area support, and your advisors do not seem to work correctly, make sure that they are started on both the local and the remote Dispatchers.
An ICMP ping is issued to the servers before the advisor request. If a firewall exists between Load Balancer and the servers, ensure that pings are supported across the firewall. If this setup poses a security risk to your network, modify the java statement in dsserver to turn off all pings to the servers by adding the java property:
LB_ADV_NO_PING="true"
java -DLB_ADV_NO_PING="true"
See Using remote advisors with Dispatcher's wide area support.
When Load Balancer is connecting to a backend server that runs Windows Server 2008, the metric collection features might cause the memload.exe application to stop running unexpectedly.
The crash occurs because the Windows Server 2008 registry might not be populated with the performance keys that these tools require. This application crash would be reported from the cpuload application.
Refer to the following Knowledge Base topic from Microsoft for steps on how to address this problem: http://support.microsoft.com/kb/300956
When using Dispatcher, Microsoft IIS, and SSL, if they do not work together, there may be a problem with enabling SSL security. For more information about generating a key pair, acquiring a certificate, installing a certificate with a key pair, and configuring a directory to require SSL, see the Microsoft Information and Peer Web Services documentation.
Dispatcher uses keys to allow you to connect to a remote machine and configure it. The keys specify an RMI port for the connection. It is possible to change the RMI port for security reasons or conflicts. When you change the RMI ports, the filename of the key is different. If you have more than one key in your keys directory for the same remote machine, and they specify different RMI ports, the command line will only try the first one it finds. If it is the incorrect one, the connection will be refused. The connection will not occur unless you delete the incorrect key.
EXCLUDE-MODULE java
EXCLUDE-MODULE javaw
This can cause problems when one of the administration consoles runs on the same machine as a firewall or through a firewall. For example, when Load Balancer runs on the same machine as a firewall, and you issue dscontrol commands, you might see errors such as Error: Server not responding.
To avoid this problem, edit the dsserver script file to set the port used by RMI for the firewall (or other application). Change the line: LB_RMISERVERPORT=10199 to LB_RMISERVERPORT=yourPort. Where yourPort is a different port.
When complete, restart dsserver and open traffic for ports 10099, 10004, 10199, and 10100, or for the chosen port for the host address from which the administration console will be run.
For example: java -Djava.rmi.server.hostname="10.1.1.1"
For Windows platforms, when using Netscape as your default browser, the following error message may result: "Cannot find the file '<filename>.html' (or one of its components). Make sure the path and filename are correct and that all required libraries are available."
The problem is due to an incorrect setting for HTML file association. The solution is the following:
The graphical user interface (GUI), which is lbadmin, requires a sufficient amount of paging space to function correctly. If insufficient paging space is available, the GUI might not start up completely. If this occurs, check your paging space and increase it if necessary.
If you uninstall Load Balancer to reinstall another version and get an error when you attempt to start the Dispatcher component, check to see if Caching Proxy is installed. Caching Proxy has a dependency on one of the Dispatcher files; this file will uninstall only when Caching Proxy is uninstalled.
To avoid this problem:
If you experience a problem with the appearance of the Load Balancer GUI, check the setting for the operating system's desktop resolution. The GUI is best viewed at a resolution of 1024x768 pixels.
On Windows platform, when you first open help windows, they sometimes disappear into the background behind existing windows. If this occurs, click on the window to bring it forward again.
On Solaris each network adapter has the same MAC address by default. This works properly when each adapter is on a different IP subnet; however, in a switched environment, when multiple NICs with the same MAC and the same IP subnet address communicate with the same switch, the switch sends all traffic bound for the single MAC (and both IPs) down the same wire. Only the adapter that last put a frame on the wire sees the IP packets bound for both adapters. Solaris might discard packets for a valid IP address that arrived on the "wrong" interface.
If all network interfaces are not designated for Load Balancer as configured in ibmlb.conf, and if the NIC that is not defined in ibmlb.conf receives a frame, Load Balancer does not have the ability to process and forward the frame.
To avoid this problem, you must override the default and set a unique MAC address for each interface. Use this command:
ifconfig interface ether macAddr
For example:
ifconfig eri0 ether 01:02:03:04:05:06
On Windows platform, you must have a network card installed and configured before starting the executor.
The AIX operating system contains a networking parameter called path MTU discovery. During a transaction with a client, if the operating system determines that it must use a smaller maximum transmission unit (MTU) for the outgoing packets, path MTU discovery has AIX create a route to remember that data. The new route is for that specific client IP and records the necessary MTU to reach it.
When the route is being created, a problem might occur on the servers resulting from the cluster being aliased on the loopback. If the gateway address for the route falls in the subnet of the cluster/netmask, AIX systems create the route on the loopback. This happens because that was the last interface aliased with that subnet.
For example, if the cluster is 9.37.54.69 and a 255.255.255.0 netmask is used, and the intended gateway is 9.37.54.1, AIX systems use the loopback for the route. This causes the server's responses to never leave the machine, and the client times out waiting. The client typically sees one response from the cluster, then the route is created and the client receives nothing more.
To address this problem, enter the following command:
This command will make the values persistent, and the values will apply to both current and future reboot values.
When you set up a Wide Area Load Balancer, you must define the remote Dispatcher as a server in a cluster on your local Dispatcher. Typically, you use the non-forwarding address (NFA) of the remote Dispatcher as the destination address of the remote server. If you do this, and then set up high availability on the remote Dispatcher, it will fail. This happens because the local Dispatcher always points to the primary on the remote side when you use its NFA to access it.
To get around this problem:
When the remote primary Dispatcher comes up, it will alias this address on its adapter, allowing it to accept traffic. If a failure occurs, the address moves to the backup machine and the backup continues to accept traffic for that address.
When using lbadmin or Web administration (lbwebaccess) to load a large configuration file (roughly 200 or more add commands), the GUI may hang or display unexpected behavior, such as responding to screen changes at an extremely slow rate of speed.
This occurs because Java does not have access to enough memory to handle such a large configuration.
There is an option on the runtime environment that can be specified to increase the memory allocation pool available to Java.
The option is -Xmxn where n is the maximum size, in bytes, for the memory allocation pool. n must be a multiple of 1024 and must be greater than 2MB. The value n may be followed by k or K to indicate kilobytes, or m or M to indicate megabytes. For example, -Xmx128M and -Xmx81920k are both valid. The default value is 64M.
For example, to add this option, edit the lbadmin script file, modifying "javaw" to "javaw -Xmxn" as follows. (For AIX systems, modify "java" to "java -Xmxn"):
javaw -Xmx256m -cp $LB_CLASSPATH $LB_INSTALL_PATH $LB_CLIENT_KEYS
com.ibm.internet.nd.framework.FWK_Main 1>/dev/null 2>&1 &
java -Xmx256m -cp $LB_CLASSPATH $LB_INSTALL_PATH $LB_CLIENT_KEYS
com.ibm.internet.nd.framework.FWK_Main 1>/dev/null 2>&1 &
javaw -Xmx256m -cp $LB_CLASSPATH $LB_INSTALL_PATH $LB_CLIENT_KEYS
com.ibm.internet.nd.framework.FWK_Main 1>/dev/null 2>&1 &
java -Xmx256m -cp $LB_CLASSPATH $LB_INSTALL_PATH $LB_CLIENT_KEYS
com.ibm.internet.nd.framework.FWK_Main 1>/dev/null 2>&1 &
START javaw -Xmx256m -cp %LB_CLASSPATH% %LB_INSTALL_PATH%
%LB_CLIENT_KEYS% com.ibm.internet.nd.framework.FWK_Main
There is no recommended value for n , but it should be greater than the default option. A good place to start would be with twice the default value.
If Load Balancer administration (lbadmin) disconnects from the server after you update the configuration, check the version of dsserver on the server that you are attempting to configure, and ensure that it is the same as your version of lbadmin or dscontrol.
When using a remote client over a secure socks implementation, fully qualified domain names or host names might not resolve to the correct IP address in IP address format notation The socks implementation might add specific, socks-related data to the DNS resolution.
If the IP addresses are not resolving correctly over the remote connection, specify the IP address in the IP address notation format.
To correct overlapping or undesirable fonts in the Korean Load Balancer interface:
-Monotype-TimesNewRomanWT-medium-r-normal
--*-%d-75-75-*-*-ksc5601.1987-0
-Monotype-SansMonoWT-medium-r-normal
--*-%d-75-75-*-*-ksc5601.1987-0
-monotype-
timesnewromanwt-medium-r-normal--*-%d-75-75-p-*-microsoft-symbol
-monotype-sansmonowt-medium-r-normal--*-%d-75-75-p-*-microsoft-symbol
On Windows systems, after aliasing the MS Loopback adapter, when issuing certain commands such as hostname, the OS will incorrectly respond with the alias address instead of the local address. To correct this problem, in the network connections list, the newly added alias must be listed below the local address. This will ensure that the local address is accessed prior to the loopback alias.
To check the network connections list:
On Windows platform when using a Matrox AGP card, unexpected behavior can occur in the Load Balancer GUI. When clicking the mouse, a block of space slightly larger than the mouse pointer can become corrupted causing possible highlighting reversal or images to shift out of place on the screen. Older Matrox cards have not shown this behavior. There is no known fix when using Matrox AGP cards.
On Linux systems, if dsserver is still running during the manual removal of the Load Balancer kernel module, unexpected behavior, such as system hang or javacores, can occur. When manually removing the Load Balancer kernel module, you must first stop dsserver.
If "dsserver stop" does not work, stop the java process with SRV_KNDConfigServer. Stop the process by finding its process identifier using ps -ef | grep SRV_KNDConfigServer command and then ending the process using kill process_id command.
You can safely run the "rmmod ibmlb" command to remove the Load Balancer module from the kernel.
If you are running the Dispatcher component for load balancing, it is possible to overload the computer with client traffic. The Load Balancer kernel module has the highest priority, and if it is constantly handling client packets, the rest of the system may become unresponsive. Running commands in user space may take a very long time to complete, or may never complete.
If this happens, you should begin to restructure your setup to avoid overloading the Load Balancer machine with traffic. Alternatives include spreading the load across several Load Balancer machines, or replacing the machine with a stronger and faster computer.
When trying to decide if the slow response time on the machine is due to high client traffic, consider whether this occurs during client peak traffic times. Misconfigured systems that cause routing loops can also cause the same symptoms. But before changing the Load Balancer setup, determine whether the symptoms may be due to high client load.
When using mac-based forwarding method, Load Balancer will send packets to the servers using the cluster address which is aliased on the loopback. Some server applications (such as SSL) require that configuration information (such as certificates) are based on the IP address. The IP address must be the cluster address which is configured on the loopback in order to match the contents of the incoming packets. If the IP address of the cluster is not used when configuring the server application, then the client request will not get properly forwarded to the server.
If you are using remote Web administration to configure Load Balancer, do not resize (Minimize, Maximize, Restore Down, and so on) the Netscape browser window in which the Load Balancer GUI appears. Because Netscape reloads a page every time browser windows are resized, this will cause a disconnect from host. You will need to reconnect to host each time you resize the window. If you are performing remote Web administration on a Windows platform, use Internet Explorer.
In a command prompt window on the Windows operating system, some national characters of the Latin-1 family might appear corrupted. For example, the letter "a" with a tilde may display as a pi symbol. To fix this, you must change the font properties of the command prompt window. To change the font, do the following:
Some HP-UX 11i installations are pre-configured to allow only 64 threads per process. However, some Load Balancer configurations require more than this amount. For HP-UX systems, set the threads per process to at least 256. To increase this value, use the "sam" utility to set the max_thread_proc kernel parameter. If heavy use is expected, you might need to increase max_thread_proc beyond 256.
To increase the max_thread_proc parameter, do the following:
When configuring your adapter on a Load Balancer machine, you must ensure that the following two settings are correct for the advisor to work:
On Windows platform, when configuring an adapter with more than one IP address, configure the IP address that you want affiliated to the host name first in the registry.
Because Load Balancer is dependent on InetAddress.getLocalHost() in many instances (for example, lbkeys create), multiple IP addresses aliased to a single adapter might cause problems. To avoid this problem, list the IP address to which you want your host name to resolve first in the registry. For example:
By default, when the Windows operating system detects a network outage, it clears its address resolution protocol (ARP) cache, including all static entries. After the network is available, the ARP cache is repopulated by ARP requests sent on the network.
With a high availability configuration, both servers take over primary operations when a loss of network connectivity affects one or both. When the ARP request is sent to repopulate the ARP cache, both servers respond, which causes the ARP cache to mark the entry as not valid. Therefore, the advisors are not able to create a socket to the backup servers.
Preventing the Windows operating system from clearing the ARP cache when there is a loss of connectivity solves this problem. Microsoft has published an article that explains how to accomplish this task. This article is on the Microsoft Web site, located in the Microsoft Knowledge Base, article number 239924: http://support.microsoft.com/default.aspx?scid=kb;en-us;239924.
The following is a summary of the steps, described in the Microsoft article, to prevent the system from clearing the ARP cache:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
Certain considerations must be taken when using Linux kernel 2.4.x servers and Dispatcher's MAC forwarding method. If the server has a cluster address configured on the loopback device using the ip address add command, only one cluster address can be aliased.
When aliasing multiple clusters to the loopback device use the ifconfig command, for example:
ifconfig lo:num clusterAddress netmask 255.255.255.255 up
Additionally, there are incompatibilities between the ifconfig method of configuring interfaces and the ip method of configuring interfaces. Best practice suggests that a site choose one method and use that method exclusively.
When adding servers to your Dispatcher configuration, the following error message can result: "Error: Router address not specified or not valid for port method".
Use this checklist to determine the problem:
The default for the router parameter is 0, which indicates the server is local. When you set the server's router address to something other than 0, this indicates that it is a remote server, on a different subnet. For more information on the router parameter on the server add command, see dscontrol server -- configure servers.
If the server that you are adding is located on a different subnet, the router parameter should be the address of the router to be used on the local subnet to communicate with the remote server.
On Solaris systems, after starting Load Balancer scripts (such as dsserver or lbadmin) from a terminal window, if you exit from that window, the Load Balancer process also exits.
To resolve this problem, start the Load Balancer scripts with the nohup command. For example: nohup dsserver. This command prevents the processes started from the terminal session from receiving a hangup signal from the terminal when it exits, allowing the processes to continue even after the terminal session has ended. Use the nohup command in front of any Load Balancer scripts that you want to continue to process beyond the end of a terminal session.
Loading a Load Balancer configuration might take a long time due to Domain Name System (DNS) calls that are made to resolve and verify the server address.
If the DNS of the Load Balancer machine is configured incorrectly, or if DNS in general takes a long time, this will cause a slow down in loading the configuration due to the Java processes that are sending DNS requests on the network.
A workaround for this is to add your server addresses and hostnames to your local /etc/hosts file.
If high availability is configured, the cluster addresses may be configured on both machines for a brief period and cause the following error message to occur: There is an IP address conflict with another system on the network. In this case, you can safely ignore the message. It is possible for a cluster address to be briefly configured on both high availability machines at the same time, especially during startup of either machine, or when a takeover has been initiated.
Check the go* scripts to ensure they are correctly configuring and unconfiguring cluster addresses. If you have invoked a configuration file and have go* scripts installed, ensure you do not have any "executor configure" command statements for your cluster addresses in your configuration file, as this will conflict with the configure and unconfigure commands in the go* scripts.
For more information on go* scripts when configuring high availability, see Using scripts.
This problem may occur when the go scripts do not run on either primary or backup machine. The go scripts cannot run if dsserver is not started on both machines. Check both machines and make sure dsserver is running.
Client requests that result in large page responses time out if the maximum transmit unit (MTU) is not set properly on the Dispatcher machine. For Dispatcher component's cbr and nat forwarding methods, this can occur because Dispatcher defaults the MTU value, rather than negotiate the value.
The MTU is set on each operating system based on the type of communication media (for example, Ethernet or Token-Ring). Routers from the local segment might have a smaller MTU set if they connect to a different type of communication media. Under normal TCP traffic, an MTU negotiation occurs during the connection setup, and the smallest MTU is used to send data between the machines.
Dispatcher does not support MTU negotiation for Dispatcher's cbr or nat forwarding method because it is actively involved as an endpoint for TCP connections. For cbr and nat forwarding, Dispatcher defaults the MTU value to 1500. This value is the typical MTU size for standard Ethernet, so most customers do not need to adjust this setting.
When using Dispatcher's cbr or nat forwarding method, if you have a router to the local segment that has a lower MTU, you must set the MTU on the Dispatcher machine to match the lower MTU.
To resolve this problem, use the following command to set the maximum segment size (mss) value: dscontrol executor set mss new_value
For example:
dscontrol executor set mss 1400
The default for mss is 1460.
The mss setting does not apply for Dispatcher's mac forwarding method or any non-Dispatcher component of Load Balancer.
When more than one IP address is on a Windows system and the hosts file does not specify the address to associate with the host name, the operating system chooses the smallest address to associate with the host name.
To resolve this problem, update the c:\Windows\system32\drivers\etc\hosts file with your machine host name and the IP address that you want to associate with the host name.
IMPORTANT: The IP address cannot be a cluster address.
When using high availability on Linux for S/390 machines with the qeth network driver, the active and standby Dispatchers may fail to synchronize. This problem might be limited to Linux Kernel 2.6.
If this problem occurs, use the following workaround:
Define a channel-to-channel (CTC) network device between the active and standby Dispatcher images and add a heartbeat between the two CTC endpoint IP addresses.
With the high availability function for Load Balancer, a partner machine can takeover load balancing if the primary partner fails or is shut down. To maintain connections between the high availability partners, connection records are passed between the two machines. When the backup partner takes over the load balancing function, the cluster IP address is removed from the backup machine and added to the new primary machine. There are numerous timing and configuration considerations that can affect this takeover operation.
The tips listed in this section can help alleviate problems that arise from high availability configuration problems such as:
The following tips are helpful for successful configuration of high availability on your Load Balancer machines.
Examples of high availability commands are:
dscontrol highavailability heartbeat add ...
dscontrol highavailability backup add ...
dscontrol highavailability reach add ...
In most cases, you must position the high availability definitions at the end of the file. The cluster, port, and server statements must be placed before the high availability statements. This is because when high availability synchronizes, it looks for the cluster, port, and server definitions when a connection record is received.
If the cluster, port, and server do not exist, the connection record is dropped. If a takeover occurs and the connection record has not been replicated on the partner machine, the connection fails.
The exception to this rule is when using collocated servers that are configured with the MAC-forwarding method. In this case, the high availability statements must come before the collocated server statements. If the high availability statements are not before the collocated server statements, Load Balancer receives a request for the collocated server, but it appears the same as an incoming request for the cluster and is load balanced. This can lead to a looping of the packets on the network and lead to excess traffic. When the high availability statements are placed before the collocated server, Load Balancer knows that it should not forward incoming traffic unless it is in the ACTIVE state.
To correct this behavior, add a sleep delay in the goActive script. The amount of time needed to sleep is deployment dependent. It is recommended that you start with a sleep delay time of 10.
By default, the machines attempt to communicate with each other every one half second and will detect a failure after four failed attempts. If you have a busy machine, this might cause failovers to occur when the system is still functioning properly. You can increase the number of times until failure by issuing:
dscontrol executor set hatimeout <value>
To accomplish this, old connections must not remain in memory for an extended amount of time. In particular, there have been issues with LDAP ports and large staletimeout periods (in excess of one day). Setting a large staletimeout period causes old connections to remain in memory, which causes more connection records to be passed at synchronization, and also more memory usage on both machines.
If the synchronization fails with a reasonable staletimeout period, you can increase the synchronization timeout by issuing:
e xm 33 5 new_timeout
This command is not stored in the configuration file when
it is saved, so you must manually add it to the configuration file
if you want this setting to persist between shutdowns.
The timeout value is stored in one half seconds; therefore, the default value for new_timeout is 100 (50 seconds).
In general, when using the MAC forwarding method, servers in the Load Balancer configuration must all be on the same network segment regardless of the platform. Active network devices such as router, bridges, and firewalls interfere with Load Balancer. This is because Load Balancer functions as a specialized router, modifying only the link-layer headers to its next and final hop. Any network topology in which the next hop is not the final hop is not valid for Load Balancer.
There is a limitation for zSeries and S/390 servers that share the OSA card, because this adapter operates differently than most network cards. The OSA card has its own virtual link layer implementation, which has nothing to do with ethernet, that is presented to the Linux and z/OS hosts behind it. Effectively, each OSA card looks just like ethernet-to-ethernet hosts (and not to the OSA hosts), and hosts that use it will respond to it as if it is ethernet.
The OSA card also performs some functions that relate to the IP layer directly. Responding to ARP (address resolution protocol) requests is one example of a function that it performs. Another is that shared OSA routes IP packets based on destination IP address, instead of on ethernet address as a layer 2 switch. Effectively, the OSA card is a bridged network segment unto itself.
Load Balancer that runs on an S/390 Linux or zSeries Linux host can forward to hosts on the same OSA or to hosts on the ethernet. All the hosts on the same shared OSA are effectively on the same segment.
Load Balancer can forward out of a shared OSA because of the nature of the OSA bridge. The bridge knows the OSA port that owns the cluster IP. The bridge knows the MAC address of hosts directly connected to the ethernet segment. Therefore, Load Balancer can MAC-forward across one OSA bridge.
However, Load Balancer cannot forward into a shared OSA. This includes the Load Balancer on an S/390 Linux when the backend server is on a different OSA card than the Load Balancer. The OSA for the backend server advertises the OSA MAC address for the server IP, but when a packet arrives with the ethernet destination address of the server's OSA and the IP of the cluster, the server's OSA card does not know which of its hosts, if any, should receive that packet. The same principles that permit OSA-to-ethernet MAC-forwarding to work out of one shared OSA do not hold when trying to forward into a shared OSA.
Workaround:
In Load Balancer configurations that use zSeries or S/390 servers that have OSA cards, there are two approaches you can take to work around the problem that has been described.
If the servers in the Load Balancer configuration are on the same zSeries or S/390 platform type, you can define point-to-point (CTC or IUCV) connections between Load Balancer and each server. Set up the endpoints with private IP addresses. The point-to-point connection is used for Load Balancer-to-server traffic only. Then add the servers with the IP address of the server endpoint of the tunnel. With this configuration, the cluster traffic comes through the Load Balancer OSA card and is forwarded across the point-to-point connection where the server responds through its own default route. The response uses the server's OSA card to leave, which might or might not be the same card.
If the servers in the Load Balancer configuration are not on the same zSeries or S/390 platform type, or if it is not possible to define a point-to-point connection between Load Balancer and each server, it is recommended that you use Load Balancer's Generic Routing Encapsulation (GRE) feature, which is a protocol that permits Load Balancer to forward across routers.
When using GRE, the client->cluster IP packet is received by Load Balancer, encapsulated, and sent to the server. At the server, the original client->cluster IP packet is excapsulated, and the server responds directly to the client. The advantage with using GRE is that Load Balancer sees only the client-to-server traffic, not the server-to-client traffic. The disadvantage is that it lowers the maximum segment size (MSS) of the TCP connection due to encapsulation overhead.
To configure Load Balancer to forward with GRE encapsulation, add the servers using the following command:
dscontrol server add cluster_add:port:backend_server router
backend_server
Where router backend_server is valid if Load Balancer and the backend server are on the same IP subnet. Otherwise, specify the valid next-hop IP address as the router.
To configure Linux systems to perform native GRE excapsulation, for each backend server, issue the following commands:
modprobe ip_gre
ip tunnel add grelb0 mode gre ikey 3735928559
ip link set grelb0 up
ip addr add cluster_addr dev grelb0
When running Load Balancer configured with the manager and advisor features, large memory leaks can occur on some Red Hat Linux versions. The Java memory leak increases if you configure a small time-interval setting for the advisor.
The IBM Java SDK versions of the JVM and the Native POSIX Thread Library (NPTL) shipped with some Linux distributions, such as Red Hat Enterprise Linux 3.0, can cause the memory leak to occur. The enhanced threading library NPTL is shipped with some distributions of Linux systems, such as Red Hat Enterprise Linux 3.0, that support NPTL.
Refer to http://www.ibm.com/developerworks/java/jdk/linux/tested.html for the latest information on Linux systems and the IBM Java SDK shipped with these systems.
As a problem determination tool, use the vmstat or ps command to detect memory leaks.
To fix the memory leak, issue the following command before running the Load Balancer machine to disable the NPTL library:
export LD_ASSUME_KERNEL=2.4.10
On Suse Linux Enterprise Server 9, when using the MAC forwarding method, the Dispatcher report might indicate that the packet was forwarded (packet count increases); however, the packet never reaches the backend server.
You might observe one or both of the following when this problem occurs:
ip_finish_output2: No header cache and no neighbour!
ICMP Destination unreachable: Fragmentation Needed
This problem might occur due to the iptables NAT module that is loaded. On SLES 9, there is a possible, but unconfirmed, error in this version of iptables that causes strange behavior when interacting with Dispatcher.
Solution:
Unload the iptables NAT module and Connection Tracking module.
For example:
# lsmod | grep ip
iptable_filter 3072 0
iptable_nat 22060 0
ip_conntrack 32560 1 iptable_nat
ip_tables 17280 2
iptable_filter,iptable_nat
ipv6 236800 19
# rmmod iptable_nat
# rmmod ip_conntrack
Remove the modules in the order of their usage. Specifically, you can remove a module only if the reference count (last column in lsmod output) is zero. If you have configured any rules in iptables, you must remove them. For example: iptables -t nat -F.
The iptable_nat module uses ip_conntrack, so you must first remove iptable_nat module, and then remove ip_conntrack module.
On Windows systems, if you are running Load Balancer's high availability feature, goScripts are used to configure the cluster IP on the active Load Balancer and to unconfigure the cluster IP on the backup system when a takeover occurs. If the goScript that configures the cluster IP address on the active machine runs before the goScript to unconfigure the IP cluster address on the backup machine, problems might occur. You might see a popup window that tells you that the system has detected an IP address conflict. If you run the ipconfig \all command, you might also see that there is a 0.0.0.0 IP address on the machine.
Solution:
Issue the following command to manually unconfigure the cluster IP address from the primary machine:
dscontrol executor unconfigure clusterIP
This removes the 0.0.0.0 address from the Windows IP stack.
After the high availability partner releases the cluster IP address, issue the following command to manually add the cluster IP back:
dscontrol executor configure clusterIP
After this command is issued, look for the cluster IP address on the Windows IP stack again by issuing the following command:
ipconfig /all
Linux iptables can interfere with load balancing of traffic and must be disabled on the Dispatcher machine.
Issue the following command to determine if iptables are loaded:
lsmod | grep ip_tables
The output from the preceding command might be similar to this:
ip_tables 22400 3
iptable_mangle,iptable_nat,iptable_filter
Issue the following command for each iptable listed in the output to display the rules for the tables:
iptables -t <short_name> -L
For example:
iptables -t mangle -L
iptables -t nat -L
iptables -t filter -L
If iptable_nat is loaded, it must be unloaded. Because iptable_nat has a dependency on iptable_conntrack, iptable_conntrack also must be removed. Issue the following command to unload these two iptables:
rmmod iptable_nat iptable_conntrack
During the Load Balancer installation process, a Java file set also gets installed. Load Balancer will be the only application that uses the Java version which installs with the product. You should not upgrade this version of the Java file set on your own. If there are problem which requires an upgrade for the Java file set, you should report the problem to IBM Service so the Java file set which is shipped within Load Balancer will be upgraded with an official fix level.
On Microsoft Windows operating systems, persistent connections might drop during a high availability takeover. This problem exists only when you have a collocated server that uses the MAC forwarding method.
When the cluster IP address is deleted, either from the ethernet interface or the loopback interface, any connections on that IP address are released. When the operating system receives a packet on a connection that has been released, it sends a RST response back to the client and the connection is terminated.
If you cannot tolerate connections being dropped during a high availability takeover, you must not use a collocated server on Windows operating systems when you use the MAC forwarding method.
This problem can occur when another application is using one of the ports used by CBR. For more information, go to Checking CBR port numbers.
EXCLUDE-MODULE java
EXCLUDE-MODULE javaw
This can cause problems when one of the administration consoles runs on the same machine as a firewall or through a firewall. For example, when Load Balancer runs on the same machine as a firewall, and you issue cbrcontrol commands, you might see errors such as Error: Server not responding.
To avoid this problem, edit the cbrserver script file to set the port used by RMI for the firewall (or other application). Change the line: LB_RMISERVERPORT=11199 to LB_RMISERVERPORT=yourPort. Where yourPort is a different port.
When complete, restart cbrserver and open traffic for ports 11099, 10004, 11199, and 11100, or for the chosen port for the host address from which the administration console will be run.
Caching Proxy and CBR have been started, but requests are not being load balanced. This error can occur if you start Caching Proxy before starting the executor. If this happens, the stderr log for Caching Proxy will contain the following error message: "ndServerInit: Could not attach to executor." To avoid this problem, start the executor before starting Caching Proxy.
On Solaris systems, the cbrcontrol executor start command returns: "Error: Executor was not started." This error occurs if you do not configure the IPC (Inter-process Communication) for the system so that the maximum size of a shared memory segment and semaphore IDs are bigger than the operating system's default. In order to increase the size of the shared memory segment and semaphore IDs, you must edit the /etc/system file. For more information on how to configure this file, see the section on modifying the system defaults for IPCs (Inter-process Communication).
If the URL rule does not work, this can be a result of either a syntactical or configuration error. For this problem check the following:
On Windows platform when using a Matrox AGP card, unexpected behavior can occur in the Load Balancer GUI. When clicking the mouse, a block of space slightly larger than the mouse pointer can become corrupted causing possible highlighting reversal or images to shift out of place on the screen. Older Matrox cards have not shown this behavior. There is no known fix when using Matrox AGP cards.
If you are using remote Web administration to configure Load Balancer, do not resize (Minimize, Maximize, Restore Down, and so on) the Netscape browser window in which the Load Balancer GUI appears. Because Netscape reloads a page every time browser windows are resized, this will cause a disconnect from host. You will need to reconnect to host each time you resize the window. If you are performing remote Web administration on a Windows platform, use Internet Explorer.
In a command prompt window on the Windows operating system, some national charactes of the Latin-1 family might appear corrupted. For example, the letter "a" with a tilde may display as a pi symbol. To fix this, you must change the font properties of the command prompt window. To change the font, do the following:
Some HP-UX 11i installations are pre-configured to allow only 64 threads per process. However, some Load Balancer configurations require more than this amount. For HP-UX systems, set the threads per process to at least 256. To increase this value, use the "sam" utility to set the max_thread_proc kernel parameter. If heavy use is expected, you might need to increase max_thread_proc beyond 256.
To increase max_thread_proc, refer to the steps to increase the max_thread_proc parameter.
When configuring your adapter on a Load Balancer machine, you must ensure that the following two settings are correct for the advisor to work:
Refer to the section on disabling task offloading for instructions on configuring this setting.
On Windows platform, when configuring an adapter with more than one IP address, configure the IP address that you want affiliated to the host name first in the registry.
Because Load Balancer is dependent on InetAddress.getLocalHost() in many instances (for example, lbkeys create), multiple IP addresses aliased to a single adapter might cause problems. To avoid this problem, list the IP address to which you want your host name to resolve first in the registry.
To address this issue, reorder the adapters in the Advanced Settings for the Control Panel’s Network Connections option. For example:
This problem can occur when another application is using one of the ports used by Site Selector. For more information, go to Checking Site Selector port numbers.
Symptom: Site Selector component does not round-robin incoming requests from Solaris clients.
Possible cause: Solaris systems run a name service cache daemon. If this daemon is running, the subsequent resolver request is answered from this cache instead of querying Site Selector.
Solution: Turn off the name service cache daemon on the Solaris machine.
EXCLUDE-MODULE java
EXCLUDE-MODULE javaw
This can cause problems when one of the administration consoles runs on the same machine as a firewall or through a firewall. For example, when Load Balancer runs on the same machine as a firewall, and you issue sscontrol commands, you might see errors such as Error: Server not responding.
To avoid this problem, edit the ssserver script file to set the port used by RMI for the firewall (or other application). Change the line: LB_RMISERVERPORT=10199 to LB_RMISERVERPORT=yourPort. Where yourPort is a different port.
When complete, restart ssserver and open traffic for ports 12099, 10004, 12199, and 12100, or for the chosen port for the host address from which the administration console will be run.
Site Selector must be able to participate in a DNS. All the machines involved in the configuration should also be participants of this system. Windows systems do not always require the configured host name to be in the DNS. Site Selector requires that its host name be define in the DNS to start properly.
Verify this host is defined in the DNS. Edit the ssserver.cmd file and remove the "w" from "javaw". This should provide more information about errors.
Site Selector's name server does not bind to any one address on the machine. It will respond to requests destined for any valid IP on the machine. Site Selector relies on the operating system to route the response back to the client. If the Site Selector machine has multiple adapters and any number of them are attached to the same subnet, it is possible the O/S will send the response to the client from a different address than it was received. Some client applications will not accept a response received from an address other than where it was sent. As a result, the name resolution will appear to fail.
On Windows platform when using a Matrox AGP card, unexpected behavior can occur in the Load Balancer GUI. When clicking the mouse, a block of space slightly larger than the mouse pointer can become corrupted causing possible highlighting reversal or images to shift out of place on the screen. Older Matrox cards have not shown this behavior. There is no known fix when using Matrox AGP cards.
If you are using remote Web administration to configure Load Balancer, do not resize (Minimize, Maximize, Restore Down, and so on) the Netscape browser window in which the Load Balancer GUI appears. Because Netscape reloads a page every time browser windows are resized, this will cause a disconnect from host. You will need to reconnect to host each time you resize the window. If you are performing remote Web administration on a Windows platform, use Internet Explorer.
In a command prompt window on the Windows operating system, some national charactes of the Latin-1 family might appear corrupted. For example, the letter "a" with a tilde may display as a pi symbol. To fix this, you must change the font properties of the command prompt window. To change the font, do the following:
Some HP-UX 11i installations are pre-configured to allow only 64 threads per process. However, some Load Balancer configurations require more than this amount. For HP-UX systems, set the threads per process to at least 256. To increase this value, use the "sam" utility to set the max_thread_proc kernel parameter. If heavy use is expected, you might need to increase max_thread_proc beyond 256.
To increase max_thread_proc, refer to the steps to increase the max_thread_proc parameter.
When configuring your adapter on a Load Balancer machine, you must ensure that the following two settings are correct for the advisor to work:
Refer to the section on disabling task offloading for instructions.
This problem can occur when another application is using one of the ports used by the Cisco CSS Controller's ccoserver. For more information, see Checking Cisco CSS Controller port numbers.
EXCLUDE-MODULE java
EXCLUDE-MODULE javaw
This can cause problems when one of the administration consoles runs on the same machine as a firewall or through a firewall. For example, when Load Balancer runs on the same machine as a firewall, and you issue ccocontrol commands, you might see errors such as Error: Server not responding.
To avoid this problem, edit the ccoserver script file to set the port used by RMI for the firewall (or other application). Change the line: CCO_RMISERVERPORT=14199 to CCO_RMISERVERPORT=yourPort. Where yourPort is a different port.
When complete, restart ccoserver and open traffic for ports 13099, 10004, 13199, and 13100, or for the chosen port for the host address from which the administration console will be run.
This problem can occur when a valid product license is missing. When you attempt to start ccoserver, you receive the following message:
Your license has expired. Contact your local IBM
representative or authorized IBM reseller.
To correct this problem:
On Windows platform when using a Matrox AGP card, unexpected behavior can occur in the Load Balancer GUI. When clicking the mouse, a block of space slightly larger than the mouse pointer can become corrupted causing possible highlighting reversal or images to shift out of place on the screen. Older Matrox cards have not shown this behavior. There is no known fix when using Matrox AGP cards.
You might experience a connection error, due to incorrect configuration settings, when adding a consultant. To fix this problem:
To fix this problem
Increase the consultant loglevel and retry the command. If it fails again, search the log for SNMP timeout or other SNMP communication errors.
If you are using remote Web administration to configure Load Balancer, do not resize (Minimize, Maximize, Restore Down, and so on) the Netscape browser window in which the Load Balancer GUI appears. Because Netscape reloads a page every time browser windows are resized, this will cause a disconnect from host. You will need to reconnect to host each time you resize the window. If you are performing remote Web administration on a Windows platform, use Internet Explorer.
In a command prompt window on the Windows operating system, some national charactes of the Latin-1 family might appear corrupted. For example, the letter "a" with a tilde may display as a pi symbol. To fix this, you must change the font properties of the command prompt window. To change the font, do the following:
Some HP-UX 11i installations are pre-configured to allow only 64 threads per process. However, some Load Balancer configurations require more than this amount. For HP-UX systems, set the threads per process to at least 256. To increase this value, use the "sam" utility to set the max_thread_proc kernel parameter. If heavy use is expected, you might need to increase max_thread_proc beyond 256.
To increase max_thread_proc, refer to the steps to increase the max_thread_proc parameter.
This problem can occur when another application is using one of the ports used by the Nortel Alteon Controller's nalserver. For more information, see Checking Nortel Alteon Controller port numbers.
EXCLUDE-MODULE java
EXCLUDE-MODULE javaw
This can cause problems when one of the administration consoles runs on the same machine as a firewall or through a firewall. For example, when Load Balancer runs on the same machine as a firewall, and you issue nalcontrol commands, you might see errors such as Error: Server not responding.
To avoid this problem, edit the nalserver script file to set the port used by RMI for the firewall (or other application). Change the line: NAL_RMISERVERPORT=14199 to NAL_RMISERVERPORT=yourPort. Where yourPort is a different port.
When complete, restart nalserver and open traffic for ports 14099, 10004, 14199, and 14100, or for the chosen port for the host address from which the administration console will be run.
This problem can occur when a valid product license is missing. When you attempt to start nalserver, you receive the following message:
Your license has expired. Contact your local IBM
representative or authorized IBM reseller.
To correct this problem:
On Windows platform when using a Matrox AGP card, unexpected behavior can occur in the Load Balancer GUI. When clicking the mouse, a block of space slightly larger than the mouse pointer can become corrupted causing possible highlighting reversal or images to shift out of place on the screen. Older Matrox cards have not shown this behavior. There is no known fix when using Matrox AGP cards.
If you are using remote Web administration to configure Load Balancer, do not resize (Minimize, Maximize, Restore Down, and so on) the Netscape browser window in which the Load Balancer GUI appears. Because Netscape reloads a page every time browser windows are resized, this will cause a disconnect from host. You will need to reconnect to host each time you resize the window. If you are performing remote Web administration on a Windows platform, use Internet Explorer.
You might experience a connection error, due to incorrect configuration settings, when adding a consultant. To fix this problem:
To fix this problem
Increase the consultant loglevel and retry the command. If it fails again, search the log for SNMP timeout or other SNMP communication errors.
In a command prompt window on the Windows platform operating system, some national charactes of the Latin-1 family might appear corrupted. For example, the letter "a" with a tilde may display as a pi symbol. To fix this, you must change the font properties of the command prompt window. To change the font, do the following:
Some HP-UX 11i installations are pre-configured to allow only 64 threads per process. However, some Load Balancer configurations require more than this amount. For HP-UX systems, set the threads per process to at least 256. To increase this value, use the "sam" utility to set the max_thread_proc kernel parameter. If heavy use is expected, you might need to increase max_thread_proc beyond 256.
To increase max_thread_proc, refer to the steps to increase the max_thread_proc parameter.
You must use the full metric name for user-written metrics on Metric Servers running on Windows platform. For example, you must specify usermetric.bat instead of usermetric. The name usermetric is valid on the command line, but will not work when run from within the runtime environment. If you do not use the full metric name, you will receive a Metric Server IOException. Set the LOG_LEVEL variable to a value of 3 in the metricserver command file, then check the log output. In this example, the exception appears as:
... java.io.IOException: CreateProcess: usermetric error=2
There can be several reasons why Metric Server is not reporting load information to Load Balancer. To determine the cause, perform the following checks:
You can also resolve this problem by specifying the host name in the Java property java.rmi.server.hostname in the metricserver script.
The Metric Server log reports this error message after key files have been transferred to the server.
This error is logged when the key file fails authorization with the paired key due to corruption in the pair. To correct this problem try the following:
While running Metric Server under heavy stress on a multi-processor AIX platform (4.3.3, 32-bit 5.1, or 64-bit 5.1), output from ps -vg command may be corrupt. For example:
55742 - A 88:19 42 18014398509449680 6396 32768 22 36 2.8 1.0 java -Xms
The SIZE and/or RSS field of the ps command may show an excessive amount of memory being used.
This is a known AIX kernel problem. Apar IY33804 will correct this problem. Obtain the fix from AIX support at http://techsupport.services.ibm.com/server/fixes, or contact your local AIX support representative.
In a two-tier Load Balancer configuration, if Site Selector (first-tier) is load balancing across a pair of Dispatcher high-availability partners (second-tier), there are steps you must complete to configure the metric server component. You must configure metric server to listen on a new IP address that is specifically for metric server's use. On the two high-availability Dispatcher machines, metric server is active only on the active Dispatcher.
To correctly configure this setup, complete the following steps:
For example:
ifconfig en0 delete 9.27.23.61
ifconfig lo0 alias 9.27.23.61 netmask 255.255.255.0
route add 9.27.23.61 127.0.0.1
metricserver stop
# Sleep either max 60 seconds or until the metricserver stops
let loopcount=0
while [[ "$loopcount" -lt "60" && 'ps -ef | grep AgentStop|
grep -c -v gr ep' -eq "1"]]
do
sleep 1
let loopcount=$loopcount+1
done
route delete 9.27.23.61
For example:
call netsh interface ip delete address "Local Area Connection" addr=9.27.23.61
call netsh interface ip add address "Local Area Connection 2" addr=9.27.2.3.61
mask = 255.255.255.0
sleep 3
metricserver stop
When running on multi-CPU Solaris machines, metricserver, cpuload, and memload scripts can produce unwanted console messages. This behavior is due to the use of the VMSTAT system command to gather CPU and memory statistics from the kernel. Some messages that VMSTAT returns indicate that the state of the kernel has changed. The scripts are unable to handle these messages, resulting in unnecessary console messages from the shell.
Examples of these console messages are:
/opt/ibm/edge/lb/ms/script/memload[29]: TOTAL=: syntax error
/opt/ibm/edge/lb/ms/script/memload[31]: LOAD=4*100/0: divide by zero
/opt/ibm/edge/lb/ms/script/memload[29]: TOTAL=659664+: more tokens expected
These messages can be ignored.
This problem might be the result of the key files losing their integrity during transfer to client.
If you are using FTP to transfer your key files from the Load Balancer machine to the backend server ensure that you are using binary mode to put or get key files to or from the FTP server.