Troubleshooting

This chapter helps you detect and resolve problems associated with Load Balancer.

Gathering troubleshooting information

Use the information in this section to gather the data that IBM service requires. The information is divided into the following subjects.

General information (always required)

For the Dispatcher component only, there is a problem determination tool that automatically gathers operating system-specific data and component-specified configuration files. To run this tool, type lbpd from the appropriate directory:

This problem determination tool packages the data into files as follows:

Before you call IBM service, have the following information available.

High availability (HA) problems

Gather the following required information for problems in an HA environment.

Advisor problems

Gather the following required information for advisor problems; for example, when advisors are mistakenly marking servers as down.

Note:
When writing custom advisors, it is helpful to use the ADVLOG(loglevel,message) to verify the advisor is working correctly.

The ADVLOG call prints statements to the advisors log file when the level is less than the logging level associated with the advisors. A logging level of 0 will cause the statement to always be written. You cannot use ADVLOG from the constructor. The log file is not created until immediately after the custom advisor's constructor has completed because the log file name depends on information that is set in the constructor.

There is another way to debug your custom advisor that will avoid this limitation. You can use System.out.println(message) statements to print messages to a window. Edit the dsserver script and change javaw to java for the print statements to appear in the window. The window used to start dsserver must be kept open for the prints to appear. If you are using Windows platforms, you must stop the Dispatcher from running as a service and manually start it from a window to see the messages.

Refer to Programming Guide for Edge Components for more information on ADVLOG.

Content Based Routing problems

Gather the following required information for Content Based Routing problems.

Not able to hit the cluster

If you are not able to hit the cluster, it is possible that neither or both of the Load Balancer machines have the cluster aliased. To determine which machine owns the cluster:

  1. On the same subnet and not on a Load Balancer machine or server:
    ping cluster
    arp -a
    If you are using Dispatcher's nat or cbr forwarding methods, ping the return address also.
  2. Look through the arp output and match the MAC (16-digit hexadecimal address) to one of the netstat -ni outputs to determine which machine physically owns the cluster.
  3. Use the following commands to interpret the output from both machines to see if they both have the cluster address.

If you do not get a response from the ping, and you are not using ULB, it is possible that neither machine has the cluster IP address aliased to its interface; for example, en0, tr0, and so forth.

All else fails

If you are unable to solve routing problems and all else has failed, issue the following command to run a trace on the network traffic:

You can also increase different log levels (for example, manager log, advisor log and so forth.) and investigate their output.

Upgrades

To identify a problem that is already fixed in a service release fix or patch, check for upgrades. To obtain a list of Edge Components defects fixed, refer to the WebSphere® Application Server Web site Support page: http://www.ibm.com/software/webservers/appserv/was/support/. From the Support page, follow the link to the corrective service download site.

Java code

The correct version of Java code is installed as part the Load Balancer installation.

See Reference Information for links to support and library Web pages. The Web support page contains a link to Self-help information in the form of Technotes.

Troubleshooting tables

Refer to the following for:

Table 12. Dispatcher troubleshooting table
Symptom Possible Cause Go to...
Dispatcher not running correctly Conflicting port numbers Checking Dispatcher port numbers
Configured a collocated server and it will not respond to load balanced requests Wrong or conflicting address Problem: Dispatcher and server will not respond
Connections from client machines not being served or connections timing out
  • Wrong routing configuration
  • NIC not aliased to the cluster address
  • Server does not have loopback device aliased to the cluster address
  • Extra route not deleted
  • Port not defined for each cluster
Problem: Dispatcher requests are not being balanced
Client machines are not being served or are timing out High availability not working Problem: Dispatcher high-availability function is not working
Unable to add heartbeat (Windows platform) Source address is not configured on an adapter Problem: Unable to add heartbeat (Windows platform)
Advisors not working correctly with wide area Advisors are not running on remote machines Problem: Advisors not working correctly
On a backend server running Windows Server 2008, memload.exe crashes The Windows Server 2008 registry might not be populated with the performance keys that these tools require. This application crash would be reported from the cpuload application. Problem: On a Windows Server 2008 backend server, memload.exe crashes
Dispatcher, Microsoft IIS, and SSL are not working or will not continue Unable to send encrypted data across protocols Problem: Dispatcher, Microsoft IIS, and SSL do not work (Windows platform)
Connection to remote machine refused Older version of the keys is still being used Problem: Dispatcher connection to a remote machine
The dscontrol or lbadmin command fails with ‘Server not responding' or ‘unable to access RMI server' message
  1. Commands fail due to socksified stack. Or commands fail due to not starting dsserver
  2. RMI ports are not set correctly
  3. Host file has incorrect local host
Problem: dscontrol or lbadmin command fails
"Cannot Find the File..." error message, when running Netscape as default browser to view online help (Windows platform) Incorrect setting for HTML file association Problem: "Cannot find the file..." error message when trying to view online Help (Windows platform)
Graphical user interface does not start correctly Insufficient paging space Problem: Graphical user interface (GUI) does not start correctly
Error running Dispatcher with Caching Proxy installed Caching Proxy file dependency Problem: Error running Dispatcher with Caching Proxy installed
Graphical user interface does not display correctly. Resolution is incorrect. Problem: Graphical user interface (GUI) does not display correctly
Help panels sometimes disappear behind other windows Java limitation Problem: On Windows platform, help windows sometimes disappear behind other open windows
Load Balancer cannot process and forward a frame Need a unique MAC address for each NIC Problem: Load Balancer cannot process and forward a frame
Blue screen appears No installed and configured network card Problem: A blue screen displays when you start the Load Balancer executor
Path to Discovery prevents return traffic The cluster is aliased on the loopback Problem: Path to Discovery prevents return traffic with Load Balancer
High availability in the Wide Area mode of Load Balancer does not work. Remote Dispatcher must be defined as a server in a cluster on local Dispatcher Problem: High availability in the Wide Area mode of Load Balancer does not work
GUI hangs (or unexpected behavior) when trying to load a large configuration file. Java does not have access to enough memory to handle such a large change to the GUI Problem: GUI hangs (or unexpected behavior) when trying to load a large configuration file
IP addresses not resolving correctly over the remote connection When using a remote client over a secure socks implementation, fully qualified domain names or host names might not resolve to the correct IP address Problem: IP addresses not resolving correctly over the remote connection
Korean Load Balancer interface displays overlapping or undesirable fonts on AIX and Linux systems Default fonts must be changed Problem: Korean Load Balancer interface displays overlapping or undesirable fonts on AIX and Linux systems
On Windows systems, after aliasing the MS Loopback adapter, when issuing certain commands such as hostname, the OS will incorrectly respond with the alias address In the network connections list, the newly added alias must not be listed above the local address Problem: On Windows systems, alias address is returned instead of local address when issuing commands such as hostname
Unexpected GUI behavior when using Windows platform paired with Matrox AGP video card Problem occurs when using Matrox AGP video cards while running the Load Balancer GUI Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards
Unexpected behavior, such as system hang, when executing "rmmod ibmlb" on Linux systems Problem occurs when manually removing the Load Balancer kernel module (ibmlb). Problem: Unexpected behavior when executing "rmmod ibmlb" (Linux systems)
Slow response time when running commands on the Dispatcher machine Slow response time can be due to machine overloading from a high volume of client traffic Problem: Slow response time running commands on Dispatcher machine
For Dispatcher's mac forwarding method, SSL or HTTPS advisor not registering server loads Problem occurs because the SSL server application not configured with the cluster IP address Problem: SSL or HTTPS advisor not registering server loads (when using mac-forwarding)
Disconnect from host when using remote Web administration through Netscape Disconnect from host will occur when resize the browser window Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration
On Windows platform, corrupted Latin-1 national characters appear in command prompt Change font properties of command prompt window Problem: On Windows systems, corrupted Latin-1 national characters appear in command prompt window
On HP-UX platform, the following message occurs: java.lang.OutOfMemoryError unable to create new native thread Some HP-UX installations by default allow 64 threads per process. This is insufficient. Problem: On HP-UX, Java out of memory or thread error occurs
On Windows platform, advisors and reach targets mark all servers down Task offloading is not disabled or may need to enable ICMP. Problem: On Windows systems, advisors and reach targets mark all servers down
On Windows platform, problem resolving IP address to hostname when more than one address is configured to an adapter The IP address you want as your hostname must appear first in the registry. Problem: On Windows platform, resolving IP address to host name when more than one address is configured to an adapter
On Windows platform, advisors not working in a high availability setup after a network outage When the system detects a network outage, it clears its Address Resolution Protocol (ARP) cache Problem: On Windows systems, after network outage, advisors not working in a high availability setup
On Linux systems, "IP address add" command and multiple cluster loopback aliases are incompatible When aliasing more than one address on the loopback device, should use ifconfig command, not ip address add Problem: On Linux systems, do not use "IP address add" command when aliasing multiple clusters on the loopback device
Error message: "Router address not specified or not valid for port method" when trying to add a server Checklist of information to determine the problem that has occurred when adding a server Problem: "Router address not specified or not valid for port method" error message
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started
Slow down occurs when loading Load Balancer configurations The delay might be due to Domain Name System (DNS) calls that are made to resolve and verify the server address. Problem: Delay occurs while loading a Load Balancer configuration
On Windows systems, the following error message appears: There is an IP address conflict with another system on the network If high availability is configured, cluster addresses may be configured on both machines for a brief period which causes this error message to appear. Problem: On Windows systems, an IP address conflict error message appears
Both primary and backup machines are active in a high availability configuration This problem may occur when the go scripts do not run on either primary or backup machine. Problem: Both primary and backup machines are active in a high availability configuration
Client requests fail when Dispatcher attempts to return large page responses Client requests that result in large page responses timeout if the maximum transmit unit (MTU) is not set properly on the Dispatcher machine when using nat or cbr forwarding. Problem: Client requests fail when attempting the return of large page responses
On Windows systems, "Server not responding" error occurs when issuing a dscontrol or lbadmin command When more than one IP address exists on a Windows system and the host file does not specify the address to associate with the hostname. Problem: On Windows systems, "Server not responding" error occurs when issuing dscontrol or lbadmin
High availability Dispatcher machines may fail to synchronize on Linux for S/390 on qeth devices When using high availability on Linux for S/390 with the qeth network driver, the active and standby Dispatchers may fail to synchronize. Problem: High availability Dispatcher machines may fail to synchronize on Linux for S/390 systems on qeth drivers
Tips on configuring the high availability feature for Load Balancer The tips will help alleviate high availability problems such as:
  • Connections dropped after takeover
  • Partner machines unable to synchronize
  • Requests erroneously directed to the backup partner machine
Problem: Tips on configuring high availability
Dispatcher MAC forwarding configuration limitations with zSeries and S/390 platforms On Linux, there are limitations when using zSeries or S/390 servers that have Open System Adapter (OSA) cards. Possible workarounds are provided. Problem: On Linux, Dispatcher configuration limitations when using zSeries or S/390 servers that have Open System Adapter (OSA) cards
On some Red Hat Linux versions, a memory leak occurs when running Load Balancer configured with the manager and advisors The IBM Java SDK versions of the JVM and the Native POSIX Thread Library (NPTL) shipped with some Linux distributions, such as Red Hat Enterprise Linux 3.0, can cause the memory leak to occur. Problem: On some Linux versions, a memory leak occurs when running Dispatcher configured with the manager and advisors
On SUSE Linux Enterprise Server 9, Dispatcher report indicates that packets are forwarded (packet-count increases), however packets never actually reach the backend server The iptables NAT module is loaded. There is a possible, but unconfirmed, error in this version of iptables that causes strange behavior when interacting with Dispatcher. Problem: On SUSE Linux Enterprise Server 9, Dispatcher forwards packets, but the packets do not reach the backend server
On Windows systems, when using Dispatcher's high availability feature, problems might occur during takeover If the goScript that configures the cluster IP address on the active machine runs before the goScript to unconfigure the IP cluster address on the backup machine, problems might occur. Problem: On Windows system, IP address conflict message appears during high availability takeover
On Linux systems, iptables can interfere with the routing of packets Linux iptables can interfere with load balancing of traffic and must be disabled on the Load Balancer machine. Problem: Linux iptables can interfere with the routing of packets
A Java fileset warning message appears when installing service fixes or installing natively, using system packaging tools The product installation consists of several packages which are not required to be installed on the same machine, so each of these packages installs a Java fileset. When installed on the same machine a warning messages stating that the Java fileset is also owned by another fileset.
Upgrading the Java fileset provided with the Load Balancer installations If a problem is found with the Java file set, you should report the problem to IBM Service so that you can receive an upgrade for the Java file set that was provided with the Load Balancer installation. Upgrading the Java file set provided with the Load Balancer installation
Persistent connections might drop during high availability takeover on a Windows platform On Microsoft Windows operating systems, persistent connections might drop during a high availability takeover. This problem exists only when you have a collocated server that uses the MAC forwarding method. Problem: Persistent connections might drop during high availability takeover
Table 13. CBR Troubleshooting table
Symptom Possible Cause Go to...
CBR not running correctly Conflicting port numbers Checking CBR port numbers
The cbrcontrol or lbadmin command fails with ‘Server not responding' or ‘unable to access RMI server' message Commands fail due to socksified stack. Or commands fail due to not starting cbrserver Problem: cbrcontrol or lbadmin command fails
Requests are not being load balanced Caching Proxy was started before the executor was started Problem: Requests not being load balanced
On Solaris, the cbrcontrol executor start command fails with ‘Error: Executor was not started.' message Command fails because the system IPC defaults may need to be modified, or link to library is incorrect. Problem: On Solaris systems, cbrcontrol executor start command fails
URL rule does not work Syntactical or configuration error Problem: Syntactical or configuration error
Unexpected GUI behavior when using Windows systems paired with Matrox AGP video card Problem occurs when using Matrox AGP video cards while running the Load Balancer GUI Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards
GUI hangs (or unexpected behavior) when trying to load a large configuration file. Java does not have access to enough memory to handle such a large change to the GUI Problem: GUI hangs (or unexpected behavior) when trying to load a large configuration file
Disconnect from host when using remote Web administration through Netscape Disconnect from host will occur when resize the browser window Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration
On Windows platform, corrupted Latin-1 national characters appear in command prompt Change font properties of command prompt window Problem: On Windows platform, corrupted Latin-1 national characters appear in command prompt window
On HP-UX platform, the following message occurs: java.lang.OutOfMemoryError unable to create new native thread Some HP-UX installations by default allow 64 threads per process. This is insufficient. Problem: On HP-UX, Java out of memory/ thread error occurs
On Windows platform, advisors and reach targets mark all servers down Task offloading is not disabled or may need to enable icmp. Problem: On Windows systems, advisors and reach targets mark all servers down
On Windows platform, problem resolving IP address to host name when more than one address is configured to an adapter The IP address you want as your hostname must appear first in the registry. Problem: On Windows systems, resolving IP address to host name when more than one address is configured to an adapter
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started
Table 14. Site Selector troubleshooting table
Symptom Possible Cause Go to...
Site Selector not running correctly Conflicting port number Checking Site Selector port numbers
Site Selector does not round-robin incoming requests from Solaris client Solaris systems run a "name service cache daemon" Problem: Site Selector does not round-robin traffic from Solaris clients
The sscontrol or lbadmin command fails with ‘Server not responding' or ‘unable to access RMI server' message Commands fail due to socksified stack. Or commands fail due to not starting ssserver. Problem: sscontrol or lbadmin command fails
ssserver fails to start on Windows platform Windows systems do not require the host name to be in the DNS. Problem: The ssserver is failing to start on Windows platform
Machine with duplicate routes not load balancing correctly -- name resolution appears to fail Site Selector machine with multiple adapters attached to the same subnet Problem: Site Selector with duplicate routes not load balancing correctly
Unexpected GUI behavior when using Windows platform paired with Matrox AGP video card Problem occurs when using Matrox AGP video cards while running the Load Balancer GUI Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards
GUI hangs (or unexpected behavior) when trying to load a large configuration file. Java does not have access to enough memory to handle such a large change to the GUI Problem: GUI hangs (or unexpected behavior) when trying to load a large configuration file
Disconnect from host when using remote Web administration through Netscape Disconnect from host will occur when resize the browser window Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration
On Windows platform, corrupted Latin-1 national characters appear in command prompt Change font properties of command prompt window Problem: On Windows platform, corrupted Latin-1 national characters appear in command prompt window
On HP-UX platform, the following message occurs: java.lang.OutOfMemoryError unable to create new native thread Some HP-UX installations by default allow 64 threads per process. This is insufficient. Problem: On HP-UX, Java out of memory/thread error occurs
On Windows platform, advisors and reach targets mark all servers down Task offloading is not disabled or may need to enable icmp. Problem: On Windows systems, advisors and reach targets mark all servers down
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started
Table 15. Controller for Cisco CSS Switches troubleshooting table
Symptom Possible Cause Go to...
ccoserver will not start Conflicting port numbers Checking Cisco CSS Controller port numbers
The ccocontrol or lbadmin command fails with ‘Server not responding' or ‘unable to access RMI server' message Commands fail due to socksified stack. Or commands fail due to not starting ccoserver. Problem: ccocontrol or lbadmin command fails
receive error: Cannot create registry on port 13099 Expired product license Problem: Cannot create registry on port 13099
Unexpected GUI behavior when using Windows platform paired with Matrox AGP video card Problem occurs when using Matrox AGP video cards while running the Load Balancer GUI Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards
Received a connection error when adding a consultant Configuration settings are incorrect on the switch or the controller Problem: Received a connection error when adding a consultant
Weights are not being updated on the switch Communication between the controller or the switch is unavailable or interrupted Problem: Weights are not being updated on the switch
Refresh command did not update the consultant configuration Communication between the switch and the controller is unavailable or interrupted Problem: Refresh command did not update the consultant configuration
GUI hangs (or unexpected behavior) when trying to load a large configuration file. Java does not have access to enough memory to handle such a large change to the GUI Problem: GUI hangs (or unexpected behavior) when trying to load a large configuration file
Disconnect from host when using remote Web administration through Netscape Disconnect from host will occur when resize the browser window Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration
On Windows platform, corrupted Latin-1 national characters appear in command prompt Change font properties of command prompt window Problem: On Windows platform, corrupted Latin-1 national characters appear in command prompt window
On HP-UX platform, the following message occurs: java.lang.OutOfMemoryError unable to create new native thread Some HP-UX installations by default allow 64 threads per process. This is insufficient. Problem: On HP-UX, Java out of memory/ thread error occurs
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started
Table 16. Nortel Alteon Controller troubleshooting table
Symptom Possible Cause Go to...
nalserver will not start Conflicting port numbers Checking Nortel Alteon Controller port numbers
The nalcontrol or lbadmin command fails with ‘Server not responding' or ‘unable to access RMI server' message Commands fail due to socksified stack. Or commands fail due to not starting nalserver. Problem: nalcontrol or lbadmin command fails
receive error: Cannot create registry on port 14099 Expired product license Problem: Cannot create registry on port 14099
Unexpected GUI behavior when using Windows platform paired with Matrox AGP video card Problem occurs when using Matrox AGP video cards while running the Load Balancer GUI Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards
GUI hangs (or unexpected behavior) when trying to load a large configuration file. Java does not have access to enough memory to handle such a large change to the GUI Problem: GUI hangs (or unexpected behavior) when trying to load a large configuration file
Disconnect from host when using remote Web administration through Netscape Disconnect from host will occur when resize the browser window Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration
Received a connection error when adding a consultant Configuration settings are incorrect on the switch or the controller Problem: Received a connection error when adding a consultant
Weights are not being updated on the switch Communication between the controller or the switch is unavailable or interrupted Problem: Weights are not being updated on the switch
Refresh command did not update the consultant configuration Communication between the switch and the controller is unavailable or interrupted Problem: Refresh command did not update the consultant configuration
On Windows platform, corrupted Latin-1 national characters appear in command prompt Change font properties of command prompt window Problem: On Windows systems, corrupted Latin-1 national characters appear in command prompt window
On HP-UX platform, the following message occurs: java.lang.OutOfMemoryError unable to create new native thread Some HP-UX installations by default allow 64 threads per process. This is insufficient. Problem: On HP-UX, Java out of memory/ thread error occurs
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started
Table 17. Metric Server troubleshooting table
Symptom Possible Cause Go to...
Metric Server IOException on Windows platform running .bat or .cmd user metric files Full metric name is required Problem: Metric Server IOException on Windows platform running .bat or .cmd user metric files
Metric Server not reporting the load information to the Load Balancer machine Possible causes include:
  • no key files on Metric Server machine
  • host name of Metric Server machine not registered with local nameserver
  • the /etc/hosts file has the local hostname resolving to the loopback address 127.0.0.1
Problem: Metric Server not reporting loads to Load Balancer machine
Metric Server log reports "Signature is necessary for access to agent" when key files transferred to server Key file fails authorization due to corruption. Problem: Metric Server log reports "Signature is necessary for access to agent"
On AIX systems, when running Metric Server under heavy stress on a multi-processor system (AIX 5.1), ps -vg command output may become corrupted APAR IY33804 corrects this known AIX problem Problem: On AIX systems, while running Metric Server under heavy stress, ps -vg command output may become corrupted
Configuring Metric Server in a two-tier configuration with Site Selector load-balancing across high-availability Dispatchers Metric Server (residing in the second-tier) is not configured to listen on a new IP address. Problem: Configuring Metric Server in a two-tier configuration with Site Selector load-balancing across high-availability Dispatchers
Scripts (metricserver, cpuload, memload) running on multi-CPU Solaris machines produce unwanted console messages This behavior is due to the use of the VMSTAT system command to gather CPU and memory statistics from the kernel. Problem: Scripts, running on multi-CPU Solaris machines, produce unwanted console messages
On Solaris systems, Load Balancer processes end when you exit the terminal session window from which they started Use the nohup command to prevent the processes that you started from receiving a hangup signal when you exit the terminal session. Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started
Metric value returns -1 after starting Metric Server This problem may be caused due to the key files losing their integrity during the transfer of the key files to the client. Problem: After starting Metric Server, metric value returns -1

Checking Dispatcher port numbers

If you are experiencing problems running Dispatcher, it may be that one of your applications is using a port number that the Dispatcher normally uses. Be aware that the Dispatcher server uses the following port numbers:

If another application is using one of the Dispatcher's port numbers, you can either change the Dispatcher's port numbers or change the application's port number.

Change the Dispatcher’s port numbers by doing the following:

Change the application's RMI port number by doing the following:

Note:
For Windows platform, dsserver and metricserver files are in the <install_root>ibm\edge\lb\bin directory. For other platforms, these file are in the /usr/bin/ directory.

Checking CBR port numbers

If you are experiencing problems running CBR, it may be that one of your applications is using a port number that CBR normally uses. Be aware that CBR uses the following port number:

Note:
The Content Based Routing (CBR) component is not available on platforms that run a 64-bit JVM, except for HP-UX ia64. On HP-UX ia64, the CBR component runs as a 32-bit application. You can use the CBR forwarding method of Load Balancer's Dispatcher component to provide content-based routing without the use of Caching Proxy. See Dispatcher's content-based routing (cbr forwarding method) for more information.

If another application is using one of the CBR's port numbers, you can either change the CBR's port numbers or change the application's port number.

Change the CBR’s port numbers by doing the following:

Change the application's RMI port number by doing the following:

Note:
For Windows platform, cbrserver and metricserver files are in the <install_root>ibm\edge\lb\bin directory. For other platforms, these file are in the /usr/bin/ directory.

Checking Site Selector port numbers

If you are experiencing problems running the Site Selector component, it may be that one of your applications is using a port number that Site Selector normally uses. Be aware that Site Selector uses the following port numbers:

If another application is using one of the Site Selector's port numbers, you can either change the Site Selector’s port numbers or change the application's port number.

Change the Site Selector's port numbers by doing the following:

Change the application's RMI port number by doing the following:

Note:
For Windows platform, ssserver and metricserver files are in the <install_root>ibm\edge\lb\bin directory. For other platforms, these file are in the /usr/bin/ directory.

Checking Cisco CSS Controller port numbers

If you are experiencing problems running the Cisco CSS Controller component, it may be that another application is using one of the port numbers used by Cisco CSS Controller's ccoserver. Be aware that Cisco CSS Controller uses the following port numbers:

If another application is using one of the Cisco CSS Controller's port numbers, you can either change the port numbers for Cisco CSS Controller or change the application's port number.

Change the Cisco CSS Controller's port numbers by doing the following:

Change the application's RMI port number by doing the following:

Note:
For Windows platform, ccoserver and metricserver files are in the <install_root>ibm\edge\lb\bin directory. For other platforms, these file are in the /usr/bin directory.

Checking Nortel Alteon Controller port numbers

If you are experiencing problems running the Nortel Alteon Controller component, it may be that another application is using one of the port numbers used by Nortel Alteon Controller's nalserver. Be aware that Nortel Alteon Controller uses the following port numbers:

If another application is using one of the Nortel Alteon Controller's port numbers, you can either change the port numbers for Nortel Alteon Controller or change the port numbers for the applicaton.

Change the port numbers for Nortel Alteon Controller by doing the following:

Change the application's RMI port number by doing the following:

Note:
For Windows platform, nalserver and metricserver files are in the <install_root>ibm\edge\lb\bin directory. For other platforms, these file are in the /usr/bin directory.

Solving common problems--Dispatcher

Problem: Dispatcher will not run

This problem can occur when another application is using one of the ports used by the Dispatcher. For more information, go to Checking Dispatcher port numbers.

Problem: Dispatcher and server will not respond

This problem occurs when another address is being used other than the address specified. When collocating the Dispatcher and server, be sure that the server address used in the configuration is the NFA address or is configured as collocated. Also, check the host file for the correct address.

Problem: Dispatcher requests are not being balanced

This problem has symptoms such as connections from client machines not being served or connections timing out. Check the following to diagnose this problem:

  1. Have you configured the nonforwarding address, clusters, ports, and servers for routing? Check the configuration file.
  2. Is the network interface card aliased to the cluster address? For AIX, HP-UX, Linux, and Solaris operating systems, use netstat -ni to check.
  3. Does the loopback device on each server have the alias set to the cluster address? For AIX, HP-UX, Linux, and Solaris operating systems, use netstat -ni to check.
  4. Is the extra route deleted? For AIX, HP-UX, Linux, and Solaris operating systems, use netstat -nr to check.
  5. Use the dscontrol cluster status command to check the information for each cluster you have defined. Make sure you have a port defined for each cluster.
  6. Use the dscontrol server report :: command to make sure that your servers are neither down nor set to a weight of zero.

For Windows and other platforms, see also Setting up server machines for load balancing.

Problem: Dispatcher high-availability function is not working

This problem appears when a Dispatcher high-availability environment is configured and connections from the client machines are not being served or are timing out. Check the following to correct or diagnose the problem:

The following steps are an effective way to test that high availability scripts are functioning properly:

  1. gather a report by issuing netstat -an and ifconfig -a from the machine
  2. run the goActive script
  3. run the goStandby script
  4. once again, gather a report by issuing netstat -an and ifconfig -a commands

The two reports are identical if the scripts are properly configured.

Problem: Unable to add heartbeat (Windows platform)

This Windows platform error occurs when the source address is not configured on an adapter. Check the following to correct or diagnose the problem.

Problem: Advisors not working correctly

If you are using wide area support, and your advisors do not seem to work correctly, make sure that they are started on both the local and the remote Dispatchers.

An ICMP ping is issued to the servers before the advisor request. If a firewall exists between Load Balancer and the servers, ensure that pings are supported across the firewall. If this setup poses a security risk to your network, modify the java statement in dsserver to turn off all pings to the servers by adding the java property:

LB_ADV_NO_PING="true"      
java  -DLB_ADV_NO_PING="true"

See Using remote advisors with Dispatcher's wide area support.

Problem: On a Windows Server 2008 backend server, memload.exe crashes

When Load Balancer is connecting to a backend server that runs Windows Server 2008, the metric collection features might cause the memload.exe application to stop running unexpectedly.

The crash occurs because the Windows Server 2008 registry might not be populated with the performance keys that these tools require. This application crash would be reported from the cpuload application.

Refer to the following Knowledge Base topic from Microsoft for steps on how to address this problem: http://support.microsoft.com/kb/300956

Problem: Dispatcher, Microsoft IIS, and SSL do not work (Windows platform)

When using Dispatcher, Microsoft IIS, and SSL, if they do not work together, there may be a problem with enabling SSL security. For more information about generating a key pair, acquiring a certificate, installing a certificate with a key pair, and configuring a directory to require SSL, see the Microsoft Information and Peer Web Services documentation.

Problem: Dispatcher connection to a remote machine

Dispatcher uses keys to allow you to connect to a remote machine and configure it. The keys specify an RMI port for the connection. It is possible to change the RMI port for security reasons or conflicts. When you change the RMI ports, the filename of the key is different. If you have more than one key in your keys directory for the same remote machine, and they specify different RMI ports, the command line will only try the first one it finds. If it is the incorrect one, the connection will be refused. The connection will not occur unless you delete the incorrect key.

Problem: dscontrol or lbadmin command fails

  1. The dscontrol command returns: Error: Server not responding. Or, the lbadmin command returns: Error: unable to access RMI server. These errors can result when your machine has a socksified stack. To correct this problem, edit the socks.cnf file to contain the following lines:
    EXCLUDE-MODULE java
    EXCLUDE-MODULE javaw
  2. The administration consoles for Load Balancer interfaces (command line, graphical user interface, and wizards) communicate with dsserver using remote method invocation (RMI). The default communication uses three ports; each port is set in the dsserver start script:

    This can cause problems when one of the administration consoles runs on the same machine as a firewall or through a firewall. For example, when Load Balancer runs on the same machine as a firewall, and you issue dscontrol commands, you might see errors such as Error: Server not responding.

    To avoid this problem, edit the dsserver script file to set the port used by RMI for the firewall (or other application). Change the line: LB_RMISERVERPORT=10199 to LB_RMISERVERPORT=yourPort. Where yourPort is a different port.

    When complete, restart dsserver and open traffic for ports 10099, 10004, 10199, and 10100, or for the chosen port for the host address from which the administration console will be run.

  3. These errors can also occur if you have not already started dsserver.
  4. If there are multiple adapters on the machine, you must designate which adapter that dsserver is to use by adding the following in the dsserver script:java.rmi.server.hostname=<host_name or IPaddress>

    For example: java -Djava.rmi.server.hostname="10.1.1.1"

Problem: "Cannot find the file..." error message when trying to view online Help (Windows platform)

For Windows platforms, when using Netscape as your default browser, the following error message may result: "Cannot find the file '<filename>.html' (or one of its components). Make sure the path and filename are correct and that all required libraries are available."

The problem is due to an incorrect setting for HTML file association. The solution is the following:

  1. Click My Computer, click Tools, select Folder Options, and click File Types tab
  2. Select "Netscape Hypertext Document"
  3. Click Advanced button, select open, click Edit button
  4. Enter NSShell in the Application: field (not the Application Used to Perform Action: field), and click OK

Problem: Graphical user interface (GUI) does not start correctly

The graphical user interface (GUI), which is lbadmin, requires a sufficient amount of paging space to function correctly. If insufficient paging space is available, the GUI might not start up completely. If this occurs, check your paging space and increase it if necessary.

Problem: Error running Dispatcher with Caching Proxy installed

If you uninstall Load Balancer to reinstall another version and get an error when you attempt to start the Dispatcher component, check to see if Caching Proxy is installed. Caching Proxy has a dependency on one of the Dispatcher files; this file will uninstall only when Caching Proxy is uninstalled.

To avoid this problem:

  1. Uninstall Caching Proxy.
  2. Uninstall Load Balancer.
  3. Reinstall both Load Balancer and Caching Proxy.

Problem: Graphical user interface (GUI) does not display correctly

If you experience a problem with the appearance of the Load Balancer GUI, check the setting for the operating system's desktop resolution. The GUI is best viewed at a resolution of 1024x768 pixels.

Problem: On Windows platform, help windows sometimes disappear behind other open windows

On Windows platform, when you first open help windows, they sometimes disappear into the background behind existing windows. If this occurs, click on the window to bring it forward again.

Problem: Load Balancer cannot process and forward a frame

On Solaris each network adapter has the same MAC address by default. This works properly when each adapter is on a different IP subnet; however, in a switched environment, when multiple NICs with the same MAC and the same IP subnet address communicate with the same switch, the switch sends all traffic bound for the single MAC (and both IPs) down the same wire. Only the adapter that last put a frame on the wire sees the IP packets bound for both adapters. Solaris might discard packets for a valid IP address that arrived on the "wrong" interface.

If all network interfaces are not designated for Load Balancer as configured in ibmlb.conf, and if the NIC that is not defined in ibmlb.conf receives a frame, Load Balancer does not have the ability to process and forward the frame.

To avoid this problem, you must override the default and set a unique MAC address for each interface. Use this command:

ifconfig interface ether macAddr

For example:

ifconfig eri0 ether 01:02:03:04:05:06

Problem: A blue screen displays when you start the Load Balancer executor

On Windows platform, you must have a network card installed and configured before starting the executor.

Problem: Path to Discovery prevents return traffic with Load Balancer

The AIX operating system contains a networking parameter called path MTU discovery. During a transaction with a client, if the operating system determines that it must use a smaller maximum transmission unit (MTU) for the outgoing packets, path MTU discovery has AIX create a route to remember that data. The new route is for that specific client IP and records the necessary MTU to reach it.

When the route is being created, a problem might occur on the servers resulting from the cluster being aliased on the loopback. If the gateway address for the route falls in the subnet of the cluster/netmask, AIX systems create the route on the loopback. This happens because that was the last interface aliased with that subnet.

For example, if the cluster is 9.37.54.69 and a 255.255.255.0 netmask is used, and the intended gateway is 9.37.54.1, AIX systems use the loopback for the route. This causes the server's responses to never leave the machine, and the client times out waiting. The client typically sees one response from the cluster, then the route is created and the client receives nothing more.

To address this problem, enter the following command:

/usr/sbin/no -p -o udp_pmtu_discover=0
/usr/sbin/no -p -o tcp_pmtu_discover=0

This command will make the values persistent, and the values will apply to both current and future reboot values.

Problem: High availability in the Wide Area mode of Load Balancer does not work

When you set up a Wide Area Load Balancer, you must define the remote Dispatcher as a server in a cluster on your local Dispatcher. Typically, you use the non-forwarding address (NFA) of the remote Dispatcher as the destination address of the remote server. If you do this, and then set up high availability on the remote Dispatcher, it will fail. This happens because the local Dispatcher always points to the primary on the remote side when you use its NFA to access it.

To get around this problem:

  1. Define an additional cluster on the remote Dispatcher. It is not necessary to define ports or servers for this cluster.
  2. Add this cluster address to your goActive and goStandy scripts.
  3. On your local Dispatcher, define this cluster address as a server, instead of the NFA of the remote primary Dispatcher.

When the remote primary Dispatcher comes up, it will alias this address on its adapter, allowing it to accept traffic. If a failure occurs, the address moves to the backup machine and the backup continues to accept traffic for that address.

Problem: GUI hangs (or unexpected behavior) when trying to load a large configuration file

When using lbadmin or Web administration (lbwebaccess) to load a large configuration file (roughly 200 or more add commands), the GUI may hang or display unexpected behavior, such as responding to screen changes at an extremely slow rate of speed.

This occurs because Java does not have access to enough memory to handle such a large configuration.

There is an option on the runtime environment that can be specified to increase the memory allocation pool available to Java.

The option is -Xmxn where n is the maximum size, in bytes, for the memory allocation pool. n must be a multiple of 1024 and must be greater than 2MB. The value n may be followed by k or K to indicate kilobytes, or m or M to indicate megabytes. For example, -Xmx128M and -Xmx81920k are both valid. The default value is 64M.

For example, to add this option, edit the lbadmin script file, modifying "javaw" to "javaw -Xmxn" as follows. (For AIX systems, modify "java" to "java -Xmxn"):

There is no recommended value for n , but it should be greater than the default option. A good place to start would be with twice the default value.

Problem: lbadmin disconnects from server after updating configuration

If Load Balancer administration (lbadmin) disconnects from the server after you update the configuration, check the version of dsserver on the server that you are attempting to configure, and ensure that it is the same as your version of lbadmin or dscontrol.

Problem: IP addresses not resolving correctly over the remote connection

When using a remote client over a secure socks implementation, fully qualified domain names or host names might not resolve to the correct IP address in IP address format notation The socks implementation might add specific, socks-related data to the DNS resolution.

If the IP addresses are not resolving correctly over the remote connection, specify the IP address in the IP address notation format.

Problem: Korean Load Balancer interface displays overlapping or undesirable fonts on AIX and Linux systems

To correct overlapping or undesirable fonts in the Korean Load Balancer interface:

Problem: On Windows systems, alias address is returned instead of local address when issuing commands such as hostname

On Windows systems, after aliasing the MS Loopback adapter, when issuing certain commands such as hostname, the OS will incorrectly respond with the alias address instead of the local address. To correct this problem, in the network connections list, the newly added alias must be listed below the local address. This will ensure that the local address is accessed prior to the loopback alias.

To check the network connections list:

  1. Click Start > Settings > Network and Dial-up Connections
  2. From the Advanced menu option, select Advanced Settings...
  3. Ensure the Local Area Connection is listed first in the Connections box
  4. If necessary, use the ordering buttons on the right to move entries up or down in the list

Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards

On Windows platform when using a Matrox AGP card, unexpected behavior can occur in the Load Balancer GUI. When clicking the mouse, a block of space slightly larger than the mouse pointer can become corrupted causing possible highlighting reversal or images to shift out of place on the screen. Older Matrox cards have not shown this behavior. There is no known fix when using Matrox AGP cards.

Problem: Unexpected behavior when executing "rmmod ibmlb" (Linux systems)

On Linux systems, if dsserver is still running during the manual removal of the Load Balancer kernel module, unexpected behavior, such as system hang or javacores, can occur. When manually removing the Load Balancer kernel module, you must first stop dsserver.

If "dsserver stop" does not work, stop the java process with SRV_KNDConfigServer. Stop the process by finding its process identifier using ps -ef | grep SRV_KNDConfigServer command and then ending the process using kill process_id command.

You can safely run the "rmmod ibmlb" command to remove the Load Balancer module from the kernel.

Problem: Slow response time running commands on Dispatcher machine

If you are running the Dispatcher component for load balancing, it is possible to overload the computer with client traffic. The Load Balancer kernel module has the highest priority, and if it is constantly handling client packets, the rest of the system may become unresponsive. Running commands in user space may take a very long time to complete, or may never complete.

If this happens, you should begin to restructure your setup to avoid overloading the Load Balancer machine with traffic. Alternatives include spreading the load across several Load Balancer machines, or replacing the machine with a stronger and faster computer.

When trying to decide if the slow response time on the machine is due to high client traffic, consider whether this occurs during client peak traffic times. Misconfigured systems that cause routing loops can also cause the same symptoms. But before changing the Load Balancer setup, determine whether the symptoms may be due to high client load.

Problem: SSL or HTTPS advisor not registering server loads (when using mac-forwarding)

When using mac-based forwarding method, Load Balancer will send packets to the servers using the cluster address which is aliased on the loopback. Some server applications (such as SSL) require that configuration information (such as certificates) are based on the IP address. The IP address must be the cluster address which is configured on the loopback in order to match the contents of the incoming packets. If the IP address of the cluster is not used when configuring the server application, then the client request will not get properly forwarded to the server.

Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration

If you are using remote Web administration to configure Load Balancer, do not resize (Minimize, Maximize, Restore Down, and so on) the Netscape browser window in which the Load Balancer GUI appears. Because Netscape reloads a page every time browser windows are resized, this will cause a disconnect from host. You will need to reconnect to host each time you resize the window. If you are performing remote Web administration on a Windows platform, use Internet Explorer.

Problem: On Windows systems, corrupted Latin-1 national characters appear in command prompt window

In a command prompt window on the Windows operating system, some national characters of the Latin-1 family might appear corrupted. For example, the letter "a" with a tilde may display as a pi symbol. To fix this, you must change the font properties of the command prompt window. To change the font, do the following:

  1. Click the icon in the upper left corner of the command prompt window
  2. Select Properties, then click the Font tab
  3. The default font is Raster fonts; change this to Lucida Console and click OK

Problem: On HP-UX, Java out of memory or thread error occurs

Some HP-UX 11i installations are pre-configured to allow only 64 threads per process. However, some Load Balancer configurations require more than this amount. For HP-UX systems, set the threads per process to at least 256. To increase this value, use the "sam" utility to set the max_thread_proc kernel parameter. If heavy use is expected, you might need to increase max_thread_proc beyond 256.

To increase the max_thread_proc parameter, do the following:

  1. From the command line, type: sam
  2. Select Kernel Configuration > Configurable Parameters
  3. From the scroll bar, select max_thread_proc
  4. Press Spacebar to highlight max_thread_proc
  5. Press Tab one time, then press the right-arrow key until you select Actions
  6. Press Enter to display the Actions menu, then press M to select Modify Configurable Parameter. (If you do not see this option, highlight max_thread_proc)
  7. Press Tab until you select the Formula/Value field
  8. Type a value of 256 or greater.
  9. Click OK
  10. Press Tab one time, then select Actions
  11. Press K for Process New Kernel..
  12. Select Yes
  13. Reboot your system

Problem: On Windows systems, advisors and reach targets mark all servers down

When configuring your adapter on a Load Balancer machine, you must ensure that the following two settings are correct for the advisor to work:

Problem: On Windows platform, resolving IP address to host name when more than one address is configured to an adapter

On Windows platform, when configuring an adapter with more than one IP address, configure the IP address that you want affiliated to the host name first in the registry.

Because Load Balancer is dependent on InetAddress.getLocalHost() in many instances (for example, lbkeys create), multiple IP addresses aliased to a single adapter might cause problems. To avoid this problem, list the IP address to which you want your host name to resolve first in the registry. For example:

  1. Start Regedit
  2. Modify the following value names as follows:
  3. Reboot
  4. Check that your host name resolves to the correct IP address. For example, ping yourhostname.

Problem: On Windows systems, after network outage, advisors not working in a high availability setup

By default, when the Windows operating system detects a network outage, it clears its address resolution protocol (ARP) cache, including all static entries. After the network is available, the ARP cache is repopulated by ARP requests sent on the network.

With a high availability configuration, both servers take over primary operations when a loss of network connectivity affects one or both. When the ARP request is sent to repopulate the ARP cache, both servers respond, which causes the ARP cache to mark the entry as not valid. Therefore, the advisors are not able to create a socket to the backup servers.

Preventing the Windows operating system from clearing the ARP cache when there is a loss of connectivity solves this problem. Microsoft has published an article that explains how to accomplish this task. This article is on the Microsoft Web site, located in the Microsoft Knowledge Base, article number 239924: http://support.microsoft.com/default.aspx?scid=kb;en-us;239924.

The following is a summary of the steps, described in the Microsoft article, to prevent the system from clearing the ARP cache:

  1. Use the Registry editor (regedit or regedit32) to open the registry.
  2. View the following key in the registry:
    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
  3. Add the following registry value: Value Name: DisableDHCPMediaSense Value Type: REG_DWORD.
  4. After the key is added, edit the value and set it to 1.
  5. Reboot the machine for the change to take effect.
Note:
This affects the ARP cache regardless of the DHCP setting.

Problem: On Linux systems, do not use "IP address add" command when aliasing multiple clusters on the loopback device

Certain considerations must be taken when using Linux kernel 2.4.x servers and Dispatcher's MAC forwarding method. If the server has a cluster address configured on the loopback device using the ip address add command, only one cluster address can be aliased.

When aliasing multiple clusters to the loopback device use the ifconfig command, for example:

ifconfig lo:num clusterAddress netmask 255.255.255.255 up 

Additionally, there are incompatibilities between the ifconfig method of configuring interfaces and the ip method of configuring interfaces. Best practice suggests that a site choose one method and use that method exclusively.

Problem: "Router address not specified or not valid for port method" error message

When adding servers to your Dispatcher configuration, the following error message can result: "Error: Router address not specified or not valid for port method".

Use this checklist to determine the problem:

The default for the router parameter is 0, which indicates the server is local. When you set the server's router address to something other than 0, this indicates that it is a remote server, on a different subnet. For more information on the router parameter on the server add command, see dscontrol server -- configure servers.

If the server that you are adding is located on a different subnet, the router parameter should be the address of the router to be used on the local subnet to communicate with the remote server.

Problem: On Solaris systems, Load Balancer processes end when you exit the terminal window from which they started

On Solaris systems, after starting Load Balancer scripts (such as dsserver or lbadmin) from a terminal window, if you exit from that window, the Load Balancer process also exits.

To resolve this problem, start the Load Balancer scripts with the nohup command. For example: nohup dsserver. This command prevents the processes started from the terminal session from receiving a hangup signal from the terminal when it exits, allowing the processes to continue even after the terminal session has ended. Use the nohup command in front of any Load Balancer scripts that you want to continue to process beyond the end of a terminal session.

Problem: Delay occurs while loading a Load Balancer configuration

Loading a Load Balancer configuration might take a long time due to Domain Name System (DNS) calls that are made to resolve and verify the server address.

If the DNS of the Load Balancer machine is configured incorrectly, or if DNS in general takes a long time, this will cause a slow down in loading the configuration due to the Java processes that are sending DNS requests on the network.

A workaround for this is to add your server addresses and hostnames to your local /etc/hosts file.

Problem: On Windows systems, an IP address conflict error message appears

If high availability is configured, the cluster addresses may be configured on both machines for a brief period and cause the following error message to occur: There is an IP address conflict with another system on the network. In this case, you can safely ignore the message. It is possible for a cluster address to be briefly configured on both high availability machines at the same time, especially during startup of either machine, or when a takeover has been initiated.

Check the go* scripts to ensure they are correctly configuring and unconfiguring cluster addresses. If you have invoked a configuration file and have go* scripts installed, ensure you do not have any "executor configure" command statements for your cluster addresses in your configuration file, as this will conflict with the configure and unconfigure commands in the go* scripts.

For more information on go* scripts when configuring high availability, see Using scripts.

Problem: Both primary and backup machines are active in a high availability configuration

This problem may occur when the go scripts do not run on either primary or backup machine. The go scripts cannot run if dsserver is not started on both machines. Check both machines and make sure dsserver is running.

Problem: Client requests fail when attempting the return of large page responses

Client requests that result in large page responses time out if the maximum transmit unit (MTU) is not set properly on the Dispatcher machine. For Dispatcher component's cbr and nat forwarding methods, this can occur because Dispatcher defaults the MTU value, rather than negotiate the value.

The MTU is set on each operating system based on the type of communication media (for example, Ethernet or Token-Ring). Routers from the local segment might have a smaller MTU set if they connect to a different type of communication media. Under normal TCP traffic, an MTU negotiation occurs during the connection setup, and the smallest MTU is used to send data between the machines.

Dispatcher does not support MTU negotiation for Dispatcher's cbr or nat forwarding method because it is actively involved as an endpoint for TCP connections. For cbr and nat forwarding, Dispatcher defaults the MTU value to 1500. This value is the typical MTU size for standard Ethernet, so most customers do not need to adjust this setting.

When using Dispatcher's cbr or nat forwarding method, if you have a router to the local segment that has a lower MTU, you must set the MTU on the Dispatcher machine to match the lower MTU.

To resolve this problem, use the following command to set the maximum segment size (mss) value: dscontrol executor set mss new_value

For example:

dscontrol executor set mss 1400 

The default for mss is 1460.

The mss setting does not apply for Dispatcher's mac forwarding method or any non-Dispatcher component of Load Balancer.

Problem: On Windows systems, "Server not responding" error occurs when issuing dscontrol or lbadmin

When more than one IP address is on a Windows system and the hosts file does not specify the address to associate with the host name, the operating system chooses the smallest address to associate with the host name.

To resolve this problem, update the c:\Windows\system32\drivers\etc\hosts file with your machine host name and the IP address that you want to associate with the host name.

IMPORTANT: The IP address cannot be a cluster address.

Problem: High availability Dispatcher machines may fail to synchronize on Linux for S/390 systems on qeth drivers

When using high availability on Linux for S/390 machines with the qeth network driver, the active and standby Dispatchers may fail to synchronize. This problem might be limited to Linux Kernel 2.6.

If this problem occurs, use the following workaround:

Define a channel-to-channel (CTC) network device between the active and standby Dispatcher images and add a heartbeat between the two CTC endpoint IP addresses.

Problem: Tips on configuring high availability

With the high availability function for Load Balancer, a partner machine can takeover load balancing if the primary partner fails or is shut down. To maintain connections between the high availability partners, connection records are passed between the two machines. When the backup partner takes over the load balancing function, the cluster IP address is removed from the backup machine and added to the new primary machine. There are numerous timing and configuration considerations that can affect this takeover operation.

The tips listed in this section can help alleviate problems that arise from high availability configuration problems such as:

The following tips are helpful for successful configuration of high availability on your Load Balancer machines.

Note:
For information on configuring the high availability feature see High availability.

Problem: On Linux, Dispatcher configuration limitations when using zSeries or S/390 servers that have Open System Adapter (OSA) cards

In general, when using the MAC forwarding method, servers in the Load Balancer configuration must all be on the same network segment regardless of the platform. Active network devices such as router, bridges, and firewalls interfere with Load Balancer. This is because Load Balancer functions as a specialized router, modifying only the link-layer headers to its next and final hop. Any network topology in which the next hop is not the final hop is not valid for Load Balancer.

Note:
Tunnels, such as channel-to-channel (CTC) or inter-user communication vehicle (IUCV), are often supported. However, Load Balancer must forward across the tunnel directly to the final destination, it cannot be a network-to-network tunnel.

There is a limitation for zSeries and S/390 servers that share the OSA card, because this adapter operates differently than most network cards. The OSA card has its own virtual link layer implementation, which has nothing to do with ethernet, that is presented to the Linux and z/OS hosts behind it. Effectively, each OSA card looks just like ethernet-to-ethernet hosts (and not to the OSA hosts), and hosts that use it will respond to it as if it is ethernet.

The OSA card also performs some functions that relate to the IP layer directly. Responding to ARP (address resolution protocol) requests is one example of a function that it performs. Another is that shared OSA routes IP packets based on destination IP address, instead of on ethernet address as a layer 2 switch. Effectively, the OSA card is a bridged network segment unto itself.

Load Balancer that runs on an S/390 Linux or zSeries Linux host can forward to hosts on the same OSA or to hosts on the ethernet. All the hosts on the same shared OSA are effectively on the same segment.

Load Balancer can forward out of a shared OSA because of the nature of the OSA bridge. The bridge knows the OSA port that owns the cluster IP. The bridge knows the MAC address of hosts directly connected to the ethernet segment. Therefore, Load Balancer can MAC-forward across one OSA bridge.

However, Load Balancer cannot forward into a shared OSA. This includes the Load Balancer on an S/390 Linux when the backend server is on a different OSA card than the Load Balancer. The OSA for the backend server advertises the OSA MAC address for the server IP, but when a packet arrives with the ethernet destination address of the server's OSA and the IP of the cluster, the server's OSA card does not know which of its hosts, if any, should receive that packet. The same principles that permit OSA-to-ethernet MAC-forwarding to work out of one shared OSA do not hold when trying to forward into a shared OSA.

Workaround:

In Load Balancer configurations that use zSeries or S/390 servers that have OSA cards, there are two approaches you can take to work around the problem that has been described.

  1. Using platform features

    If the servers in the Load Balancer configuration are on the same zSeries or S/390 platform type, you can define point-to-point (CTC or IUCV) connections between Load Balancer and each server. Set up the endpoints with private IP addresses. The point-to-point connection is used for Load Balancer-to-server traffic only. Then add the servers with the IP address of the server endpoint of the tunnel. With this configuration, the cluster traffic comes through the Load Balancer OSA card and is forwarded across the point-to-point connection where the server responds through its own default route. The response uses the server's OSA card to leave, which might or might not be the same card.

  2. Using Load Balancer's GRE feature

    If the servers in the Load Balancer configuration are not on the same zSeries or S/390 platform type, or if it is not possible to define a point-to-point connection between Load Balancer and each server, it is recommended that you use Load Balancer's Generic Routing Encapsulation (GRE) feature, which is a protocol that permits Load Balancer to forward across routers.

    When using GRE, the client->cluster IP packet is received by Load Balancer, encapsulated, and sent to the server. At the server, the original client->cluster IP packet is excapsulated, and the server responds directly to the client. The advantage with using GRE is that Load Balancer sees only the client-to-server traffic, not the server-to-client traffic. The disadvantage is that it lowers the maximum segment size (MSS) of the TCP connection due to encapsulation overhead.

    To configure Load Balancer to forward with GRE encapsulation, add the servers using the following command:

    dscontrol server add cluster_add:port:backend_server router
    backend_server  

    Where router backend_server is valid if Load Balancer and the backend server are on the same IP subnet. Otherwise, specify the valid next-hop IP address as the router.

    To configure Linux systems to perform native GRE excapsulation, for each backend server, issue the following commands:

    modprobe ip_gre 
    ip tunnel add grelb0 mode gre ikey 3735928559 
    ip link set grelb0 up 
    ip addr add cluster_addr dev grelb0
    Note:
    Do not define the cluster address on the loopback of the backend servers. When using z/OS backend servers, you must use z/OS-specific commands to configure the servers to perform GRE excapsulation.

Problem: On some Linux versions, a memory leak occurs when running Dispatcher configured with the manager and advisors

When running Load Balancer configured with the manager and advisor features, large memory leaks can occur on some Red Hat Linux versions. The Java memory leak increases if you configure a small time-interval setting for the advisor.

The IBM Java SDK versions of the JVM and the Native POSIX Thread Library (NPTL) shipped with some Linux distributions, such as Red Hat Enterprise Linux 3.0, can cause the memory leak to occur. The enhanced threading library NPTL is shipped with some distributions of Linux systems, such as Red Hat Enterprise Linux 3.0, that support NPTL.

Refer to http://www.ibm.com/developerworks/java/jdk/linux/tested.html for the latest information on Linux systems and the IBM Java SDK shipped with these systems.

As a problem determination tool, use the vmstat or ps command to detect memory leaks.

To fix the memory leak, issue the following command before running the Load Balancer machine to disable the NPTL library:

export LD_ASSUME_KERNEL=2.4.10

Problem: On SUSE Linux Enterprise Server 9, Dispatcher forwards packets, but the packets do not reach the backend server

On Suse Linux Enterprise Server 9, when using the MAC forwarding method, the Dispatcher report might indicate that the packet was forwarded (packet count increases); however, the packet never reaches the backend server.

You might observe one or both of the following when this problem occurs:

This problem might occur due to the iptables NAT module that is loaded. On SLES 9, there is a possible, but unconfirmed, error in this version of iptables that causes strange behavior when interacting with Dispatcher.

Solution:

Unload the iptables NAT module and Connection Tracking module.

For example:

# lsmod | grep ip
        iptable_filter          3072  0
        iptable_nat            22060  0
        ip_conntrack           32560  1 iptable_nat
        ip_tables              17280  2 
iptable_filter,iptable_nat
        ipv6                  236800  19
        # rmmod iptable_nat
        # rmmod ip_conntrack  

Remove the modules in the order of their usage. Specifically, you can remove a module only if the reference count (last column in lsmod output) is zero. If you have configured any rules in iptables, you must remove them. For example: iptables -t nat -F.

The iptable_nat module uses ip_conntrack, so you must first remove iptable_nat module, and then remove ip_conntrack module.

Note:
Just trying to list rules configured on a table loads up the corresponding module; for example: iptables -t nat -L. Make sure that you do not run this after the modules are removed.

Problem: On Windows system, IP address conflict message appears during high availability takeover

On Windows systems, if you are running Load Balancer's high availability feature, goScripts are used to configure the cluster IP on the active Load Balancer and to unconfigure the cluster IP on the backup system when a takeover occurs. If the goScript that configures the cluster IP address on the active machine runs before the goScript to unconfigure the IP cluster address on the backup machine, problems might occur. You might see a popup window that tells you that the system has detected an IP address conflict. If you run the ipconfig \all command, you might also see that there is a 0.0.0.0 IP address on the machine.

Solution:

Issue the following command to manually unconfigure the cluster IP address from the primary machine:

dscontrol executor unconfigure clusterIP

This removes the 0.0.0.0 address from the Windows IP stack.

After the high availability partner releases the cluster IP address, issue the following command to manually add the cluster IP back:

dscontrol executor configure clusterIP

After this command is issued, look for the cluster IP address on the Windows IP stack again by issuing the following command:

ipconfig /all

Problem: Linux iptables can interfere with the routing of packets

Linux iptables can interfere with load balancing of traffic and must be disabled on the Dispatcher machine.

Issue the following command to determine if iptables are loaded:

lsmod | grep ip_tables

The output from the preceding command might be similar to this:

ip_tables         22400   3
iptable_mangle,iptable_nat,iptable_filter

Issue the following command for each iptable listed in the output to display the rules for the tables:

iptables -t <short_name> -L

For example:

iptables -t mangle -L 
iptables -t nat    -L
iptables -t filter -L    

If iptable_nat is loaded, it must be unloaded. Because iptable_nat has a dependency on iptable_conntrack, iptable_conntrack also must be removed. Issue the following command to unload these two iptables:

rmmod iptable_nat iptable_conntrack

Upgrading the Java file set provided with the Load Balancer installation

During the Load Balancer installation process, a Java file set also gets installed. Load Balancer will be the only application that uses the Java version which installs with the product. You should not upgrade this version of the Java file set on your own. If there are problem which requires an upgrade for the Java file set, you should report the problem to IBM Service so the Java file set which is shipped within Load Balancer will be upgraded with an official fix level.

Problem: Persistent connections might drop during high availability takeover

On Microsoft Windows operating systems, persistent connections might drop during a high availability takeover. This problem exists only when you have a collocated server that uses the MAC forwarding method.

When the cluster IP address is deleted, either from the ethernet interface or the loopback interface, any connections on that IP address are released. When the operating system receives a packet on a connection that has been released, it sends a RST response back to the client and the connection is terminated.

If you cannot tolerate connections being dropped during a high availability takeover, you must not use a collocated server on Windows operating systems when you use the MAC forwarding method.

Solving common problems--CBR

Problem: CBR will not run

This problem can occur when another application is using one of the ports used by CBR. For more information, go to Checking CBR port numbers.

Problem: cbrcontrol or lbadmin command fails

  1. The cbrcontrol command returns: Error: Server not responding. Or, the lbadmin command returns: Error: unable to access RMI server. These errors can result when your machine has a socksified stack. To correct this problem, edit the socks.cnf file to contain the following lines:
    EXCLUDE-MODULE java
    EXCLUDE-MODULE javaw
  2. The administration consoles for Load Balancer interfaces (command line, graphical user interface, and wizards) communicate with cbrserver using remote method invocation (RMI). The default communication uses three ports; each port is set in the cbrserver start script:

    This can cause problems when one of the administration consoles runs on the same machine as a firewall or through a firewall. For example, when Load Balancer runs on the same machine as a firewall, and you issue cbrcontrol commands, you might see errors such as Error: Server not responding.

    To avoid this problem, edit the cbrserver script file to set the port used by RMI for the firewall (or other application). Change the line: LB_RMISERVERPORT=11199 to LB_RMISERVERPORT=yourPort. Where yourPort is a different port.

    When complete, restart cbrserver and open traffic for ports 11099, 10004, 11199, and 11100, or for the chosen port for the host address from which the administration console will be run.

  3. These errors can also occur if you have not already started cbrserver.

Problem: Requests not being load balanced

Caching Proxy and CBR have been started, but requests are not being load balanced. This error can occur if you start Caching Proxy before starting the executor. If this happens, the stderr log for Caching Proxy will contain the following error message: "ndServerInit: Could not attach to executor." To avoid this problem, start the executor before starting Caching Proxy.

Problem: On Solaris systems, cbrcontrol executor start command fails

On Solaris systems, the cbrcontrol executor start command returns: "Error: Executor was not started." This error occurs if you do not configure the IPC (Inter-process Communication) for the system so that the maximum size of a shared memory segment and semaphore IDs are bigger than the operating system's default. In order to increase the size of the shared memory segment and semaphore IDs, you must edit the /etc/system file. For more information on how to configure this file, see the section on modifying the system defaults for IPCs (Inter-process Communication).

Problem: Syntactical or configuration error

If the URL rule does not work, this can be a result of either a syntactical or configuration error. For this problem check the following:

Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards

On Windows platform when using a Matrox AGP card, unexpected behavior can occur in the Load Balancer GUI. When clicking the mouse, a block of space slightly larger than the mouse pointer can become corrupted causing possible highlighting reversal or images to shift out of place on the screen. Older Matrox cards have not shown this behavior. There is no known fix when using Matrox AGP cards.

Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration

If you are using remote Web administration to configure Load Balancer, do not resize (Minimize, Maximize, Restore Down, and so on) the Netscape browser window in which the Load Balancer GUI appears. Because Netscape reloads a page every time browser windows are resized, this will cause a disconnect from host. You will need to reconnect to host each time you resize the window. If you are performing remote Web administration on a Windows platform, use Internet Explorer.

Problem: On Windows platform, corrupted Latin-1 national characters appear in command prompt window

In a command prompt window on the Windows operating system, some national charactes of the Latin-1 family might appear corrupted. For example, the letter "a" with a tilde may display as a pi symbol. To fix this, you must change the font properties of the command prompt window. To change the font, do the following:

  1. Click the icon in the upper left corner of the command prompt window
  2. Select Properties, then click the Font tab
  3. The default font is Raster fonts; change this to Lucida Console and click OK

Problem: On HP-UX, Java out of memory/ thread error occurs

Some HP-UX 11i installations are pre-configured to allow only 64 threads per process. However, some Load Balancer configurations require more than this amount. For HP-UX systems, set the threads per process to at least 256. To increase this value, use the "sam" utility to set the max_thread_proc kernel parameter. If heavy use is expected, you might need to increase max_thread_proc beyond 256.

To increase max_thread_proc, refer to the steps to increase the max_thread_proc parameter.

Problem: On Windows systems, advisors and reach targets mark all servers down

When configuring your adapter on a Load Balancer machine, you must ensure that the following two settings are correct for the advisor to work:

Refer to the section on disabling task offloading for instructions on configuring this setting.

Problem: On Windows systems, resolving IP address to host name when more than one address is configured to an adapter

On Windows platform, when configuring an adapter with more than one IP address, configure the IP address that you want affiliated to the host name first in the registry.

Because Load Balancer is dependent on InetAddress.getLocalHost() in many instances (for example, lbkeys create), multiple IP addresses aliased to a single adapter might cause problems. To avoid this problem, list the IP address to which you want your host name to resolve first in the registry.

To address this issue, reorder the adapters in the Advanced Settings for the Control Panel’s Network Connections option. For example:

  1. Open the Control Panel.
  2. Open the Network Connections option.
  3. From the menu bar, select Advanced > Advanced Settings...
  4. Reorder the adapters that are listed in the Advanced Settings panel.

Solving common problems--Site Selector

Problem: Site Selector will not run

This problem can occur when another application is using one of the ports used by Site Selector. For more information, go to Checking Site Selector port numbers.

Problem: Site Selector does not round-robin traffic from Solaris clients

Symptom: Site Selector component does not round-robin incoming requests from Solaris clients.

Possible cause: Solaris systems run a name service cache daemon. If this daemon is running, the subsequent resolver request is answered from this cache instead of querying Site Selector.

Solution: Turn off the name service cache daemon on the Solaris machine.

Problem: sscontrol or lbadmin command fails

  1. The sscontrol command returns: Error: Server not responding. Or, the lbadmin command returns: Error: unable to access RMI server. These errors can result when your machine has a socksified stack. To correct this problem, edit the socks.cnf file to contain the following lines:
    EXCLUDE-MODULE java
    EXCLUDE-MODULE javaw
  2. The administration consoles for Load Balancer interfaces (command line, graphical user interface, and wizards) communicate with ssserver using remote method invocation (RMI). The default communication uses three ports; each port is set in the ssserver start script:

    This can cause problems when one of the administration consoles runs on the same machine as a firewall or through a firewall. For example, when Load Balancer runs on the same machine as a firewall, and you issue sscontrol commands, you might see errors such as Error: Server not responding.

    To avoid this problem, edit the ssserver script file to set the port used by RMI for the firewall (or other application). Change the line: LB_RMISERVERPORT=10199 to LB_RMISERVERPORT=yourPort. Where yourPort is a different port.

    When complete, restart ssserver and open traffic for ports 12099, 10004, 12199, and 12100, or for the chosen port for the host address from which the administration console will be run.

  3. These errors can also occur if you have not already started ssserver.

Problem: The ssserver is failing to start on Windows platform

Site Selector must be able to participate in a DNS. All the machines involved in the configuration should also be participants of this system. Windows systems do not always require the configured host name to be in the DNS. Site Selector requires that its host name be define in the DNS to start properly.

Verify this host is defined in the DNS. Edit the ssserver.cmd file and remove the "w" from "javaw". This should provide more information about errors.

Problem: Site Selector with duplicate routes not load balancing correctly

Site Selector's name server does not bind to any one address on the machine. It will respond to requests destined for any valid IP on the machine. Site Selector relies on the operating system to route the response back to the client. If the Site Selector machine has multiple adapters and any number of them are attached to the same subnet, it is possible the O/S will send the response to the client from a different address than it was received. Some client applications will not accept a response received from an address other than where it was sent. As a result, the name resolution will appear to fail.

Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards

On Windows platform when using a Matrox AGP card, unexpected behavior can occur in the Load Balancer GUI. When clicking the mouse, a block of space slightly larger than the mouse pointer can become corrupted causing possible highlighting reversal or images to shift out of place on the screen. Older Matrox cards have not shown this behavior. There is no known fix when using Matrox AGP cards.

Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration

If you are using remote Web administration to configure Load Balancer, do not resize (Minimize, Maximize, Restore Down, and so on) the Netscape browser window in which the Load Balancer GUI appears. Because Netscape reloads a page every time browser windows are resized, this will cause a disconnect from host. You will need to reconnect to host each time you resize the window. If you are performing remote Web administration on a Windows platform, use Internet Explorer.

Problem: On Windows platform, corrupted Latin-1 national characters appear in command prompt window

In a command prompt window on the Windows operating system, some national charactes of the Latin-1 family might appear corrupted. For example, the letter "a" with a tilde may display as a pi symbol. To fix this, you must change the font properties of the command prompt window. To change the font, do the following:

  1. Click the icon in the upper left corner of the command prompt window
  2. Select Properties, then click the Font tab
  3. The default font is Raster fonts; change this to Lucida Console and click OK

Problem: On HP-UX, Java out of memory/thread error occurs

Some HP-UX 11i installations are pre-configured to allow only 64 threads per process. However, some Load Balancer configurations require more than this amount. For HP-UX systems, set the threads per process to at least 256. To increase this value, use the "sam" utility to set the max_thread_proc kernel parameter. If heavy use is expected, you might need to increase max_thread_proc beyond 256.

To increase max_thread_proc, refer to the steps to increase the max_thread_proc parameter.

Problem: On Windows systems, advisors and reach targets mark all servers down

When configuring your adapter on a Load Balancer machine, you must ensure that the following two settings are correct for the advisor to work:

Refer to the section on disabling task offloading for instructions.

Solving common problems--Cisco CSS Controller

Problem: ccoserver will not start

This problem can occur when another application is using one of the ports used by the Cisco CSS Controller's ccoserver. For more information, see Checking Cisco CSS Controller port numbers.

Problem: ccocontrol or lbadmin command fails

  1. The ccocontrol command returns: Error: Server not responding. Or, the lbadmin command returns: Error: unable to access RMI server. These errors can result when your machine has a socksified stack. To correct this problem, edit the socks.cnf file to contain the following lines:
    EXCLUDE-MODULE java
    EXCLUDE-MODULE javaw
  2. The administration consoles for Load Balancer interfaces (command line and graphical user interface) communicate with ccoserver using remote method invocation (RMI). The default communication uses three ports; each port is set in the ccoserver start script:

    This can cause problems when one of the administration consoles runs on the same machine as a firewall or through a firewall. For example, when Load Balancer runs on the same machine as a firewall, and you issue ccocontrol commands, you might see errors such as Error: Server not responding.

    To avoid this problem, edit the ccoserver script file to set the port used by RMI for the firewall (or other application). Change the line: CCO_RMISERVERPORT=14199 to CCO_RMISERVERPORT=yourPort. Where yourPort is a different port.

    When complete, restart ccoserver and open traffic for ports 13099, 10004, 13199, and 13100, or for the chosen port for the host address from which the administration console will be run.

  3. These errors can also occur if you have not already started ccoserver.

Problem: Cannot create registry on port 13099

This problem can occur when a valid product license is missing. When you attempt to start ccoserver, you receive the following message:

Your license has expired.  Contact your local IBM
representative or authorized IBM reseller.

To correct this problem:

  1. If you have already attempted to start ccoserver, type ccoserver stop.
  2. Copy your valid license to the following directory:
  3. Type ccoserver to start the server.

Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards

On Windows platform when using a Matrox AGP card, unexpected behavior can occur in the Load Balancer GUI. When clicking the mouse, a block of space slightly larger than the mouse pointer can become corrupted causing possible highlighting reversal or images to shift out of place on the screen. Older Matrox cards have not shown this behavior. There is no known fix when using Matrox AGP cards.

Problem: Received a connection error when adding a consultant

You might experience a connection error, due to incorrect configuration settings, when adding a consultant. To fix this problem:

Problem: Weights are not being updated on the switch

To fix this problem

Problem: Refresh command did not update the consultant configuration

Increase the consultant loglevel and retry the command. If it fails again, search the log for SNMP timeout or other SNMP communication errors.

Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration

If you are using remote Web administration to configure Load Balancer, do not resize (Minimize, Maximize, Restore Down, and so on) the Netscape browser window in which the Load Balancer GUI appears. Because Netscape reloads a page every time browser windows are resized, this will cause a disconnect from host. You will need to reconnect to host each time you resize the window. If you are performing remote Web administration on a Windows platform, use Internet Explorer.

Problem: On Windows platform, corrupted Latin-1 national characters appear in command prompt window

In a command prompt window on the Windows operating system, some national charactes of the Latin-1 family might appear corrupted. For example, the letter "a" with a tilde may display as a pi symbol. To fix this, you must change the font properties of the command prompt window. To change the font, do the following:

  1. Click the icon in the upper left corner of the command prompt window
  2. Select Properties, then click the Font tab
  3. The default font is Raster fonts; change this to Lucida Console and click OK

Problem: On HP-UX, Java out of memory/ thread error occurs

Some HP-UX 11i installations are pre-configured to allow only 64 threads per process. However, some Load Balancer configurations require more than this amount. For HP-UX systems, set the threads per process to at least 256. To increase this value, use the "sam" utility to set the max_thread_proc kernel parameter. If heavy use is expected, you might need to increase max_thread_proc beyond 256.

To increase max_thread_proc, refer to the steps to increase the max_thread_proc parameter.

Solving common problems--Nortel Alteon Controller

Problem: nalserver will not start

This problem can occur when another application is using one of the ports used by the Nortel Alteon Controller's nalserver. For more information, see Checking Nortel Alteon Controller port numbers.

Problem: nalcontrol or lbadmin command fails

  1. The nalcontrol command returns: Error: Server not responding. Or, the lbadmin command returns: Error: unable to access RMI server. These errors can result when your machine has a socksified stack. To correct this problem, edit the socks.cnf file to contain the following lines:
    EXCLUDE-MODULE java
    EXCLUDE-MODULE javaw
  2. The administration consoles for Load Balancer interfaces (command line and graphical user interface) communicate with nalserver using remote method invocation (RMI). The default communication uses three ports; each port is set in the nalserver start script:

    This can cause problems when one of the administration consoles runs on the same machine as a firewall or through a firewall. For example, when Load Balancer runs on the same machine as a firewall, and you issue nalcontrol commands, you might see errors such as Error: Server not responding.

    To avoid this problem, edit the nalserver script file to set the port used by RMI for the firewall (or other application). Change the line: NAL_RMISERVERPORT=14199 to NAL_RMISERVERPORT=yourPort. Where yourPort is a different port.

    When complete, restart nalserver and open traffic for ports 14099, 10004, 14199, and 14100, or for the chosen port for the host address from which the administration console will be run.

  3. These errors can also occur if you have not already started nalserver.

Problem: Cannot create registry on port 14099

This problem can occur when a valid product license is missing. When you attempt to start nalserver, you receive the following message:

Your license has expired.  Contact your local IBM
representative or authorized IBM reseller.

To correct this problem:

  1. If you have already attempted to start nalserver, type nalserver stop.
  2. Copy your valid license to the
  3. Type nalserver to start the server.

Problem: On Windows platform, unexpected GUI behavior when using Matrox AGP video cards

On Windows platform when using a Matrox AGP card, unexpected behavior can occur in the Load Balancer GUI. When clicking the mouse, a block of space slightly larger than the mouse pointer can become corrupted causing possible highlighting reversal or images to shift out of place on the screen. Older Matrox cards have not shown this behavior. There is no known fix when using Matrox AGP cards.

Problem: Disconnect from host occurs when resize Netscape browser window while using Web administration

If you are using remote Web administration to configure Load Balancer, do not resize (Minimize, Maximize, Restore Down, and so on) the Netscape browser window in which the Load Balancer GUI appears. Because Netscape reloads a page every time browser windows are resized, this will cause a disconnect from host. You will need to reconnect to host each time you resize the window. If you are performing remote Web administration on a Windows platform, use Internet Explorer.

Problem: Received a connection error when adding a consultant

You might experience a connection error, due to incorrect configuration settings, when adding a consultant. To fix this problem:

Problem: Weights are not being updated on the switch

To fix this problem

Problem: Refresh command did not update the consultant configuration

Increase the consultant loglevel and retry the command. If it fails again, search the log for SNMP timeout or other SNMP communication errors.

Problem: On Windows systems, corrupted Latin-1 national characters appear in command prompt window

In a command prompt window on the Windows platform operating system, some national charactes of the Latin-1 family might appear corrupted. For example, the letter "a" with a tilde may display as a pi symbol. To fix this, you must change the font properties of the command prompt window. To change the font, do the following:

  1. Click the icon in the upper left corner of the command prompt window
  2. Select Properties, then click the Font tab
  3. The default font is Raster fonts; change this to Lucida Console and click OK

Problem: On HP-UX, Java out of memory/ thread error occurs

Some HP-UX 11i installations are pre-configured to allow only 64 threads per process. However, some Load Balancer configurations require more than this amount. For HP-UX systems, set the threads per process to at least 256. To increase this value, use the "sam" utility to set the max_thread_proc kernel parameter. If heavy use is expected, you might need to increase max_thread_proc beyond 256.

To increase max_thread_proc, refer to the steps to increase the max_thread_proc parameter.

Solving common problems--Metric Server

Problem: Metric Server IOException on Windows platform running .bat or .cmd user metric files

You must use the full metric name for user-written metrics on Metric Servers running on Windows platform. For example, you must specify usermetric.bat instead of usermetric. The name usermetric is valid on the command line, but will not work when run from within the runtime environment. If you do not use the full metric name, you will receive a Metric Server IOException. Set the LOG_LEVEL variable to a value of 3 in the metricserver command file, then check the log output. In this example, the exception appears as:

 ... java.io.IOException: CreateProcess: usermetric error=2

Problem: Metric Server not reporting loads to Load Balancer machine

There can be several reasons why Metric Server is not reporting load information to Load Balancer. To determine the cause, perform the following checks:

Problem: Metric Server log reports "Signature is necessary for access to agent"

The Metric Server log reports this error message after key files have been transferred to the server.

This error is logged when the key file fails authorization with the paired key due to corruption in the pair. To correct this problem try the following:

Problem: On AIX systems, while running Metric Server under heavy stress, ps -vg command output may become corrupted

While running Metric Server under heavy stress on a multi-processor AIX platform (4.3.3, 32-bit 5.1, or 64-bit 5.1), output from ps -vg command may be corrupt. For example:

 55742 - A 88:19 42 18014398509449680  6396 32768 22 36 2.8 1.0 java -Xms

The SIZE and/or RSS field of the ps command may show an excessive amount of memory being used.

This is a known AIX kernel problem. Apar IY33804 will correct this problem. Obtain the fix from AIX support at http://techsupport.services.ibm.com/server/fixes, or contact your local AIX support representative.

Problem: Configuring Metric Server in a two-tier configuration with Site Selector load-balancing across high-availability Dispatchers

In a two-tier Load Balancer configuration, if Site Selector (first-tier) is load balancing across a pair of Dispatcher high-availability partners (second-tier), there are steps you must complete to configure the metric server component. You must configure metric server to listen on a new IP address that is specifically for metric server's use. On the two high-availability Dispatcher machines, metric server is active only on the active Dispatcher.

To correctly configure this setup, complete the following steps:

Problem: Scripts, running on multi-CPU Solaris machines, produce unwanted console messages

When running on multi-CPU Solaris machines, metricserver, cpuload, and memload scripts can produce unwanted console messages. This behavior is due to the use of the VMSTAT system command to gather CPU and memory statistics from the kernel. Some messages that VMSTAT returns indicate that the state of the kernel has changed. The scripts are unable to handle these messages, resulting in unnecessary console messages from the shell.

Examples of these console messages are:

/opt/ibm/edge/lb/ms/script/memload[29]: TOTAL=: syntax error
/opt/ibm/edge/lb/ms/script/memload[31]: LOAD=4*100/0: divide by zero
/opt/ibm/edge/lb/ms/script/memload[29]: TOTAL=659664+: more tokens expected

These messages can be ignored.

Problem: After starting Metric Server, metric value returns -1

This problem might be the result of the key files losing their integrity during transfer to client.

If you are using FTP to transfer your key files from the Load Balancer machine to the backend server ensure that you are using binary mode to put or get key files to or from the FTP server.