IBM WebSphere Application Server, Advanced Edition

Tuning Guide


What's new for performance tuning?

Performance tuner wizard

Dynamic fragment caching

Symptom table

Tuning basics

   What influences tuning?

   Types of tuning

      Parameter tuning

           Tuning parameters with a high impact
           Tuning parameters for avoiding failures

    Adjusting WebSphere Application Server system queues

       WebSphere queuing network

             Closed queues versus open queues
             Queue settings in WebSphere

       Determining settings

             Queuing before WebSphere

             Drawing a throughput curve

             Queue adjustments
             Queue adjustments for accessing patterns

       Queuing and enterprise beans

       Queuing and clustering

    Tuning Secure Socket Layer

       Overview of handshake and bulk encryption/decryption

       How to enhance Secure Socket Layer performance

    Application assembly performance checklist

    Tuning Java memory

       The garbage collection bottleneck

       The garbage collection gauge

       Detecting over utilization of objects

       Detecting memory leaks

       Java heap parameters

    Number of connections to DB2

    Solaris TCP parameters

    Workload management topology

Individual performance parameters

    Hardware capacity and settings

       Processor speed

       Memory

       Network

    Operating system settings

       Windows NT/2000 TCP/IP parameters

             Windows NT/2000 TcpTimedWaitDelay
             Windows NT/2000 MaxUserPort

       AIX (4.3.3)

             AIX file descriptors (ulimit)

       Solaris

             Solaris file descriptors (ulimit)
             Solaris tcp_time_wait_interval
             Solaris tcp_fin_wait_2_flush_interval
             Solaris tcp_keepalive_interval
             Other Solaris TCP parameters
             Solaris kernel semsys:seminfo_semume
             Solaris kernel semsys:seminfo_semopm

       HP-UX 11

             Setting the Virtual Page Size to 64K for WebSphere Application Server JVM
             HP-UX 11 tcp_conn_request_max
             HP-UX 11 kernel parameter recommendations

    The Web server

       Web server configuration reload interval

       IBM HTTP server - AIX and Solaris

             MaxClients
             MinSpareServers, MaxSpareServers, and StartServers

       Netscape Enterprise server - AIX and Solaris

             Active threads

       Microsoft Internet Information Server - Windows NT/2000

             Internet Information Server (IIS) Permission Properties
             Number of expected hits per day
             ListenBackLog parameter

       IBM HTTP server - Linux

             MaxRequestsPerChild

       IBM HTTP server - Windows NT/2000

             ThreadsPerChild
             ListenBackLog

    The WebSphere application server process

       Adjust the application server process priority

       Web containers

             Web container maximum thread size
             Transport Maximum Keep-Alive
             Transport Maximum Requests per Keep-Alive
             URL invocation cache
             Allow thread allocation beyond maximum

       EJB container

             Cache settings
             Break CMP enterprise beans into several enterprise bean modules

       Security

             Turn off security when you do not need it
             Fine-tune the security cache time out for the environment
             Security cache types and sizes (system parameters)
             Configure Secure Socket Layer sessions appropriately

       Object Request Broker (ORB)

             Pass-by-value versus pass-by-reference (NoLocalCopies)
             com.ibm.CORBA.ServerSocketQueueDepth
             com.ibm.CORBA.MaxOpenConnections and ORB Connection Cache Maximum
             Object Request Broker thread pool size
             Using JNI ReaderManager and Reader Threads

    Java Virtual Machines (JVMs)

       Sun JDK 1.3 HotSpot -server warmup

       Sun JDK 1.3 HotSpot new generation pool size

       Just In Time (JIT) compiler

       Heap size settings

       Class garbage collection

    The database

       Database location

       WebSphere data source connection pool size

       Prepared statement cache size

       DB2

             Use TCP sockets for DB2 on Linux
             DB2 MaxAppls
             DB2 MaxAgents
             DB2 buffpage
             DB2 query optimization level
             DB2 reorgchk
             DB2 MinCommit

    Session management

    WebSphere Application Server Enterprise Extensions Message Listener

       Maximum sessions

      Multiple application servers listening on the same queue

Additional reference

Performance tool procedures

       Starting Windows NT/2000 Performance Monitor



What's new for performance tuning?

Performance tuner wizard
Dynamic fragment caching

Performance tuner wizard

The Performance tuner wizard is a tool included in WebSphere Application Server, Advanced Edition that includes the most common performance related settings associated with the application server. Use Performance Tuner Wizard to optimize the settings for applications, servlets, enterprise beans, data sources and expected load. Parameters that can be set include:

To invoke this from the administrative console select Console > Wizards > Performance Tuner.

Dynamic fragment caching

Dynamic fragment caching is the ability to cache the output of dynamic servlets and JSP files, a technology that dramatically improves application performance. This cache, working within the JVM of an application server, intercepts calls to a servlet service() method, checking whether it can serve the invocation from the cache rather than re-execute the servlet. Because J2EE applications have high read-write ratios and can tolerate a small degree of latency in the freshness of their data, fragment caching can create an opportunity for dramatic gains in server response time, throughput and scalability.

Once a servlet has been executed (generating the output that will be cached), a cache entry is generated containing that output. Also generated are side effects of the execution (that is, invoking other servlets or JSP files), as well as metadata about the entry, including time out and entry priority information. Unique entries are distinguished by an ID string generated from the HttpServletRequest object for each invocation of the servlet. This results in a servlet being cached depending on request parameters that the URI used to invoke the servlet or session information. Because JSP files are compiled by WebSphere Application Server into servlets, JSPs and servlets are used interchangeably (except when declaring elements within an XML file).

To set this:

  1. From the administrative console select the application server you are tuning.
  2. Click Services > Web Container Service > Edit Properties.
  3. Select the Servlet Caching tab and check the box Enable Servlet Caching. By checking this box, you are enabling the caching facility, however, nothing will be cached until specific servlets/JSP files are selected.
  4. Click OK and save the changes.
  5. Restart the application server.

Symptom table

Take a shortcut into tuning by reviewing the symptom table. The table is designed for easy access to symptoms and a quick link to tuning information related to that symptom. The table contains the following types of information:

Symptom Additional information
Throughput and response time are undesirable. Processor speed
AIX: Memory allocation error
Solaris: Too many files open
AIX file descriptors (ulimit) or Solaris file descriptors (ulimit)
Solaris: The server stalls during peak periods, responses take minutes, processor utilization remains high with all activity in the system processes, and netstat shows many sockets are open to port 80 in CLOSE_WAIT or FIN_WAIT_2 state. Solaris tcp_time_wait_interval and Solaris tcp_fin_wait_2_flush_interval
Windows NT/2000: Netstat shows too many sockets are in TIME_WAIT. Windows NT/2000 TcpTimedWaitDelay
Throughput is undesirable and the application server priority has not been adjusted. Adjusting the operating system priority of the WebSphere Application Server process
Under load, client requests do not arrive at the Web server because they time out or are rejected. For HP-UX 11 see HP-UX 11 tcp_conn_request_max
For IIS Windows NT/2000 see ListenBackLog parameter
For IBM HTTP Server on NT see ListenBackLog
Windows NT/2000: WebSphere Application Server performance decreased after an application server from another vendor was installed. IIS Permission Properties
Resource Analyzer percent maxed metric indicates that the Web container thread pool is too small. Web container maximum thread size
Netstat shows too many TIME_WAIT state sockets for port 9080. Web container transport maximum Keep-Alive
Too much disk input/output occurs due to paging. Heap size settings
Resource Analyzer's percent used metric for a data source connection pool indicates the pool size to too large. WebSphere data source connection pool size
Resource Analyzer's prepared statement cache discards metric indicates the data source prepared statement cache size is too small. Prepared statement cache size
Too much disk input/output occurs due to DB2 writing log records. DB2 MinCommit
Resource Analyzer percent maxed metric indicates the Object Request Broker thread pool is too small. Queuing and enterprise beans
Resource Analyzer Java Virtual Machine Profiler Interface (JVMPI) indicates over-utilization of objects when too much time is being spent in garbage collection. Detecting over-utilization of objects
Resource Analyzer used memory metric shows memory leaks and Java displays an Out of Memory exception. Detecting memory leaks
Throughput, response time and scalability are undesirable. If the application permits, exploit dynamic fragment caching

Tuning basics

A wide range of performance improvements can be made to WebSphere Application Server when using procedures available for tuning. This tuning guide teaches how tuning works, by giving general recommendations and a description of specific tuning methodologies. Also available are hints and tips on the various factors and variables that can enhance performance tuning.

Use the tuning guide, along with its examples and resources, to expand your tuning experience. Tuning is an ongoing learning process. Results can vary from those presented in this guide.

What influences tuning?

The following are all components that can affect the performance of WebSphere Application Sever: Each has its own tuning options, varying in importance and impact. Each of the above are explained in detail in the Individual Performance Parameters section of this document.

For convenience, some procedures are described for setting parameters in other products. Because these products can change, consider these procedures as suggestions.

Types of tuning

There are two types of tuning: application tuning and parameter tuning.

Although application tuning sometimes offers the greatest tuning improvements, this document focuses on individual performance parameters and synergy between them.

The white paper, WebSphere Application Server Development Best Practices for Performance and Scalability, addresses application tuning by describing development best practices for both Web applications containing servlets, Java Server Pages (JSP) files, Java Database Connectivity (JDBC), and enterprise applications containing enterprise bean components.

Parameter tuning

Parameter tuning is the art of changing WebSphere Application Server settings with the goal of improving performance. The values suggested in this document are general guidelines. The optimal settings for your environment can vary significantly. In addition, remember that after tuning one bottleneck away, you can encounter another, unrelated bottleneck. If so, you might not experience the performance improvement until both bottlenecks have been removed.

This section discusses two kinds of tuning parameters:
Tuning parameters with a high impact
These parameters have high performance results. They are a subset of all other parameters and have an important impact on performance. Because these are application dependent, the appropriate parameter settings for the application and environment might be different.

The following table lists various high performance-enhancing tuning parameters:

Application assembly performance checklist
Adjusting WebSphere Application Server system queues
Using pass-by-value versus pass-by-reference (NoLocalCopies)
Adjusting Solaris TCP parameters
Tuning Java memory
Adjusting MaxRequestsPerChild: on Linux with IBM HTTP Server
Adjusting WebSphere data source connection pool size
Adjusting prepared statement cache size
Web server configuration reload interval
Tuning parameters for avoiding failures

The following parameters help to prevent functional problems:

ListenBackLog parameter: Applies if running Windows NT/2000 with IIS under heavy client load
Number of connections to DB2: If you establish more connections than DB2 sets up by default
Allow thread allocation beyond maximum has been selected and the system is overloaded because too many threads are allocated.
Using TCP Sockets for DB2 on Linux: For local databases
WebSphere data source connection pool size: Be sure to have enough connections to handle the extra connections required for transaction processing with Entity EJBs and to avoid deadlock.

Adjusting the queues in WebSphere Application Server

WebSphere Application Server has a series of interrelated components that must be harmoniously tuned to support the custom needs of your end-to-end e-business application. These adjustments help the system achieve maximum throughput while maintaining the overall stability of the system.

Queuing network

WebSphere Application Server establishes a queuing network, which is a network of interconnected queues that represent the various components of the application serving platform. These queues include the network, Web server, Web container, EJB container, data source and possibly a connection manager to a custom back-end system. Each of these resources represents a queue of requests waiting to use that resource.

The WebSphere queues are load-dependent resources. The average service time of a request depends on the number of concurrent clients.

Closed queues versus open queues

Most of the queues that make up the queuing network are closed queues. In contrast with an open queue, a closed queue places a limit on the maximum number of requests present in the queue.

A closed queue allows system resources to be tightly managed. For example, the Web container's Max Connections setting controls the size of the Web container queue. If the average servlet running in a Web container creates 10MB of objects during each request, then a value of 100 for max connections would limit the memory consumed by the Web container to 1GB.

In a closed queue, a request can be in one of two states: active or waiting. An active request is either doing work or waiting for a response from a downstream queue. For example, an active request in the Web server is either doing work (such as retrieving static HTML) or waiting for a request to complete in the Web container. In the waiting state, the request is waiting to become active. The request remains in the waiting state until one of the active requests leaves the queue.

All Web servers supported by WebSphere Application Server are closed queues, as are WebSphere Application Server data sources. Web containers can be configured as either open or closed queues. In general, it is best to make them closed queues. EJB containers are open queues; if there are no threads available in the pool, a new one will be created for the duration of the request.

If enterprise beans are being called by servlets, the Web container limits the number of total concurrent requests into an EJB container because the Web container also has a limit. This is true only if enterprise beans are called from the servlet thread of execution. Nothing prevents you from creating threads and bombarding the EJB container with requests. This is one reason why it is not a good idea for a servlet to create its own work threads.

Queue settings in WebSphere Application Server

The following outlines the various queue settings:

Determining the settings

The following section outlines a methodology for configuring the WebSphere Application Server queues. The dynamics of the system can be changed and therefore, the tuning parameters can be changed, by moving resources (for example, moving the database server onto another machine) or providing more powerful resources, such as a faster set of CPUs with more memory. Thus, adjust the tuning parameters to a specific configuration of the production environment.

Queuing before WebSphere

The first rule of tuning is to minimize the number of requests in WebSphere Application Server queues. In general, it is better for requests to wait in the network (in front of the Web server) than it is for them to wait in WebSphere Application Server. This configuration will result in allowing into the queuing network only requests that are ready to be processed. To accomplish this configuration, specify the queues furthest upstream (closest to the client) to be slightly larger, and specify the queues further downstream (furthest from the client) to be progressively smaller.

The queues in this queuing network are progressively smaller as work flows downstream. When 200 clients arrive at the Web server, 125 requests remain queued in the network because the Web server is set to handle 75 concurrent clients. As the 75 requests pass from the Web server to the Web container, 25 remain queued in the Web server and the remaining 50 are be handled by the Web container. This process progresses through the data source until 25 users arrive at the final destination, the database server. Because, at each point upstream, there is some work waiting to enter a component, no component in this system must wait for work to arrive. The bulk of the requests wait in the network, outside of WebSphere Application Server. This adds stability because no component is overloaded. Routing software like IBM's Network Dispatcher can be used to direct waiting users to other servers in a WebSphere Application Server cluster.

Drawing a throughput curve

Using a test case that represents the full spirit of the production application (for example, it exercises all meaningful code paths) or using the production application itself, run a set of experiments to determine when the system capabilities are maximized (the saturation point). Conduct these tests after most of the bottlenecks have been removed from the application. The typical goal of these tests is to drive CPUs to near 100% utilization.

Start the initial baseline experiment with large queues. This allows maximum concurrency through the system. For example, start the first experiment with a queue size of 100 at each of the servers in the queuing network: Web server, Web container and data source.

Next, begin a series of experiments to plot a throughput curve, increasing the concurrent user load after each experiment. For example, perform experiments with 1 user, 2 users, 5, 10, 25, 50, 100, 150 and 200 users. After each run, record the throughput (requests per second) and response times (seconds per request).

The curve resulting from the baseline experiments should resemble the typical throughput curve shown as follows:

The throughput of WebSphere Application Server is a function of the number of concurrent requests present in the total system. Section A, the light load zone, shows as the number of concurrent user requests increases, the throughput increases almost linearly with the number of requests. This reflects that, at light loads, concurrent requests face very little congestion within the WebSphere Application Server system queues. At some point, congestion starts to develop and throughput increases at a much lower rate until it reaches a saturation point that represents the maximum throughput value, as determined by some bottleneck in the WebSphere Application Server system. The most manageable type of bottleneck is when the CPUs of the WebSphere Application Server machines become saturated. This is desirable because a CPU bottleneck is remedied by adding additional or more powerful CPUs.

In the heavy load zone or Section B, as the concurrent client load increases, throughput remains relatively constant. However, the response time increases proportionally to the user load. That is, if the user load is doubled in the heavy load zone, the response time doubles. At some point, represented by Section C, the buckle zone, one of the system components becomes exhausted. At this point, throughput starts to degrade. For example, the system might enter the buckle zone when the network connections at the Web server exhaust the limits of the network adapter or if the operating system limits for file handles is exceeded.

If the saturation point is reached by driving the use of the system CPUs close to 100%, move on to the next step. If the CPU is not driven to 100%, there is likely a bottleneck that is being aggravated by the application. For example, the application might be creating Java objects excessively causing garbage collection bottlenecks in Java.

There are two ways to managing application bottlenecks: remove the bottleneck or clone the bottleneck. The best way to manage a bottleneck is to remove it. Use a Java-based application profiler to examine the overall object utilization. Profilers such as Performance Trace Data Visualizer(PTDV), JProbe and Jinsight can be used.

Queue adjustments

The number of concurrent users at the throughput saturation point represents the maximum concurrency of the application. For example, if the application saturated WebSphere Application Server at 50 users, you might find that 48 users gave the best combination of throughput and response time. This value is called the Max Application Concurrency value. Max Application Concurrency becomes the value to use as the basis for adjusting the WebSphere Application Server system queues. Remember, it is desirable for most users to wait in the network, therefore, decrease queue sizes while moving downstream farther from the client. For example, given a Max Application Concurrency value of 48, start with system queues at the following values: Web server 75, Web container 50, data source 45. Perform a set of additional experiments adjusting these values slightly higher and lower to find the best settings.

The Resource Analyzer can be used to determine the number of concurrent users through the servlet engine thread pool concurrently active threads metric.

In performance experiments, throughput has increased by 10-15% when the Web container transport maximum Keep-Alive are adjusted to match the maximum number of Web container threads.

Adjusting queue settings for access patterns

In many cases, only a fraction of the requests passing through one queue enters the next queue downstream. In a site with many static pages, many requests are fulfilled at the Web server and are not passed to the Web container. In this circumstance, the Web server queue can be significantly larger than the Web container queue. In the previous section, the Web server queue was set to 75 rather than closer to the value of Max Application Concurrency. Similar adjustments need to be made when different components have different execution times.

For example, in an application that spends 90% of its time in a complex servlet and only 10% making a short JDBC query, on average 10% of the servlets are using database connections at any time, so the database connection queue can be significantly smaller than the Web container queue. Conversely, if much of a servlet execution time is spent making a complex query to a database, consider increasing the queue values at both the Web container and the data source. Always monitor the CPU and memory utilization for both the WebSphere Application Server and the database servers to ensure the CPU or memory are not being saturated.

Queuing and enterprise beans

Method invocations to enterprise beans are queued only if the client, which is making the method call, is remote. For example, if the EJB client is running in a separate Java Virtual Machine (another address space) from the enterprise bean. In contrast, if the EJB client (either a servlet or another enterprise bean) is installed in the same JVM, the EJB method runs on the same thread of execution as the EJB client and there is no queuing.

Remote enterprise beans communicate by using the RMI/IIOP protocol. Method invocations initiated over RMI/IIOP are processed by a server-side ORB. The thread pool acts as a queue for incoming requests. However, if a remote method request is issued and there are no more available threads in the thread pool, a new thread is created. After the method request completes the thread is destroyed. Therefore, when the ORB is used to process remote method requests, the EJB container is an open queue, because its use of threads is unbounded. The following illustration depicts the two queuing options of enterprise beans.

When configuring the thread pool, it is important to understand the calling patterns of the EJB client. If a servlet is making a small number of calls to remote enterprise beans and each method call is relatively quick, consider setting the number of threads in the ORB thread pool to a value lower than the Web container thread pool size value.

Resource Analyzer shows a metric called percent maxed used to determine how much of the time all of the configured threads are in use. If this value is consistently in the double-digits, then the ORB could be a bottleneck and the number of threads should be increased.

The degree to which the ORB thread pool value needs to be increased, is a function of the number of simultaneous servlets (that is, clients) calling enterprise beans and the duration of each method call. If the method calls are longer, consider making the ORB thread pool size equal to the Web container size because there is little interleaving of remote method calls. If the servlet makes only short-lived or quick calls to the ORB, the ORB thread pool can be small. Several servlets can potentially reuse the same ORB thread. In this case, the ORB thread pool can be small, perhaps even one-half of the thread pool size setting of the Web container. If the application spends a lot of time in the ORB, configure a more even relationship between the Web container and the ORB.

Queuing and clustering

The capabilities for cloning application servers can be a valuable asset in configuring highly scalable production environments. This is especially true when the application is experiencing bottlenecks that are preventing full CPU utilization of Symmetric Multiprocessing (SMP) servers. When adjusting the WebSphere Application Server system queues in clustered configurations, remember that when a server is added to a cluster, the server downstream receives twice the load.

Two Web container clones are located between a Web server and a data source. It is assumed the Web server, servlet engines and data source (but not the database) are all running on a single SMP server. Given these constraints, the following queue considerations need to be made:

Tuning Secure Socket Layer

The following are two types of Secure Socket Layer(SSL) performance:

Overview of handshake and bulk encryption/decryption

When an SSL connection is established, an SSL handshake occurs. After a connection is made, SSL performs bulk encryption and decryption for each read-write. The performance cost of an SSL handshake is much larger than that of bulk encryption and decryption.

How to enhance SSL performance

In order to enhance SSL performance, the number of individual SSL connections and handshakes must be decreased.

Decreasing the number of connections increases performance for secure communication through SSL connections, as well as non-secure communication through simple TCP connections. One way to decrease individual SSL connections is to use a browser that supports HTTP 1.1. Decreasing individual SSL connections could be impossible for some users if they cannot upgrade to HTTP 1.1.

It is more common to decrease the number of connections (both TCP and SSL) between two WebSphere Application Server components. The following guidelines help to ensure the HTTP transport of the application server is configured so that the Web server plug-in does not repeatedly reopen new connections to the application server:

Prepared statement cache size

Other JDBC parameters
In addition to setting the prepared statement cache size, you can set other specific properties for JDBC drivers. For example, using Oracle, you can increase the number of rows to fetch while getting result sets with the following statement:
name="defaultRowPrefetch", value="25"
Enter these types of custom properties on the General tab for the database.

DB2

DB2 has many parameters that can be configured to optimize database performance. For complete DB2 tuning information, refer to the DB2 UDB Administration Guide: Performance.
Use TCP sockets for DB2 on Linux
DB2 MaxAppls
DB2 MaxAgents
DB2 buffpage
DB2 query optimization level
DB2 reorgchk
DB2 MinCommit

Session management

For additional information on setting session management parameters, see InfoCenter article about session programming model and environment.

WebSphere Application Server Enterprise Extensions Message Listener

WebSphere Application Server Enterprise Extensions provides support for Extended Messaging Support. This section contains tuning suggestions for the JMS Listener function which is part of Extended Messaging Support.

Maximum sessions

Multiple application servers listening on the same queue

Additional references

Performance tool procedures

Starting Windows NT/2000 Performance Monitor

Follow these steps:
From the Start menu choose Programs > Administrative Tools > Performance Monitor