IBM WebSphere Application Server, Advanced EditionTuning Guide |
![]() |
Performance tuner wizard
Dynamic fragment caching
To invoke this from the administrative console select Console > Wizards > Performance Tuner.
Dynamic fragment caching is the ability to cache the output of dynamic servlets and JSP files, a technology that dramatically improves application performance. This cache, working within the JVM of an application server, intercepts calls to a servlet service() method, checking whether it can serve the invocation from the cache rather than re-execute the servlet. Because J2EE applications have high read-write ratios and can tolerate a small degree of latency in the freshness of their data, fragment caching can create an opportunity for dramatic gains in server response time, throughput and scalability.
Once a servlet has been executed (generating the output that will be cached), a cache entry is generated containing that output. Also generated are side effects of the execution (that is, invoking other servlets or JSP files), as well as metadata about the entry, including time out and entry priority information. Unique entries are distinguished by an ID string generated from the HttpServletRequest object for each invocation of the servlet. This results in a servlet being cached depending on request parameters that the URI used to invoke the servlet or session information. Because JSP files are compiled by WebSphere Application Server into servlets, JSPs and servlets are used interchangeably (except when declaring elements within an XML file).
To set this:
Symptom | Additional information |
Throughput and response time are undesirable. | Processor speed |
AIX: Memory allocation error Solaris: Too many files open | AIX file descriptors (ulimit) or Solaris file descriptors (ulimit) |
Solaris: The server stalls during peak periods, responses take minutes, processor utilization remains high with all activity in the system processes, and netstat shows many sockets are open to port 80 in CLOSE_WAIT or FIN_WAIT_2 state. | Solaris tcp_time_wait_interval and Solaris tcp_fin_wait_2_flush_interval |
Windows NT/2000: Netstat shows too many sockets are in TIME_WAIT. | Windows NT/2000 TcpTimedWaitDelay |
Throughput is undesirable and the application server priority has not been adjusted. | Adjusting the operating system priority of the WebSphere Application Server process |
Under load, client requests do not arrive at the Web server because they time out or are rejected. |
For HP-UX 11 see HP-UX 11 tcp_conn_request_max For IIS Windows NT/2000 see ListenBackLog parameter For IBM HTTP Server on NT see ListenBackLog |
Windows NT/2000: WebSphere Application Server performance decreased after an application server from another vendor was installed. | IIS Permission Properties |
Resource Analyzer percent maxed metric indicates that the Web container thread pool is too small. | Web container maximum thread size |
Netstat shows too many TIME_WAIT state sockets for port 9080. | Web container transport maximum Keep-Alive |
Too much disk input/output occurs due to paging. | Heap size settings |
Resource Analyzer's percent used metric for a data source connection pool indicates the pool size to too large. | WebSphere data source connection pool size |
Resource Analyzer's prepared statement cache discards metric indicates the data source prepared statement cache size is too small. | Prepared statement cache size |
Too much disk input/output occurs due to DB2 writing log records. | DB2 MinCommit |
Resource Analyzer percent maxed metric indicates the Object Request Broker thread pool is too small. | Queuing and enterprise beans |
Resource Analyzer Java Virtual Machine Profiler Interface (JVMPI) indicates over-utilization of objects when too much time is being spent in garbage collection. | Detecting over-utilization of objects |
Resource Analyzer used memory metric shows memory leaks and Java displays an Out of Memory exception. | Detecting memory leaks |
Throughput, response time and scalability are undesirable. | If the application permits, exploit dynamic fragment caching |
A wide range of performance improvements can be made to WebSphere Application Server
when using procedures available for tuning. This tuning guide teaches how tuning
works, by giving general recommendations and a description of specific tuning methodologies. Also available
are hints and tips on the various factors and variables that can enhance performance
tuning.
Use the tuning guide, along with its examples and resources, to expand your tuning
experience. Tuning is an ongoing
learning process. Results can vary from those presented in this guide.
For convenience, some procedures are described for setting parameters in other products. Because these products can change, consider these procedures as suggestions.
There are two types of tuning: application tuning and parameter tuning.
Although application tuning sometimes offers the greatest tuning improvements, this document focuses on individual performance parameters and synergy between them.
The white paper, WebSphere Application Server Development Best Practices for Performance and Scalability, addresses application tuning by describing development best practices for both Web applications containing servlets, Java Server Pages (JSP) files, Java Database Connectivity (JDBC), and enterprise applications containing enterprise bean components.
The following table lists various high performance-enhancing tuning parameters:
The following parameters help
to prevent functional problems:
ListenBackLog parameter: Applies if running Windows NT/2000 with IIS under heavy client load |
Number of connections to DB2: If you establish more connections than DB2 sets up by default |
Allow thread allocation beyond maximum has been selected and the system is overloaded because too many threads are allocated. |
Using TCP Sockets for DB2 on Linux: For local databases |
WebSphere data source connection pool size: Be sure to have enough connections to handle the extra connections required for transaction processing with Entity EJBs and to avoid deadlock. |
WebSphere Application Server has a series of interrelated components that must be harmoniously tuned to support the custom needs of your end-to-end e-business application. These adjustments help the system achieve maximum throughput while maintaining the overall stability of the system.
WebSphere Application Server establishes a queuing network, which is a network of interconnected queues that represent the various components of the application serving platform. These queues include the network, Web server, Web container, EJB container, data source and possibly a connection manager to a custom back-end system. Each of these resources represents a queue of requests waiting to use that resource.
The WebSphere queues are load-dependent resources. The average service time of a request depends on the number of concurrent clients.
Most of the queues that make up the queuing network are closed queues. In contrast with an open queue, a closed queue places a limit on the maximum number of requests present in the queue.
A closed queue allows system resources to be tightly managed. For example, the Web container's Max Connections setting controls the size of the Web container queue. If the average servlet running in a Web container creates 10MB of objects during each request, then a value of 100 for max connections would limit the memory consumed by the Web container to 1GB.
In a closed queue, a request can be in one of two states: active or waiting. An active request is either doing work or waiting for a response from a downstream queue. For example, an active request in the Web server is either doing work (such as retrieving static HTML) or waiting for a request to complete in the Web container. In the waiting state, the request is waiting to become active. The request remains in the waiting state until one of the active requests leaves the queue.
All Web servers supported by WebSphere Application Server are closed queues, as are WebSphere Application Server data sources. Web containers can be configured as either open or closed queues. In general, it is best to make them closed queues. EJB containers are open queues; if there are no threads available in the pool, a new one will be created for the duration of the request.
If enterprise beans are being called by servlets, the Web container limits the number of total concurrent requests into an EJB container because the Web container also has a limit. This is true only if enterprise beans are called from the servlet thread of execution. Nothing prevents you from creating threads and bombarding the EJB container with requests. This is one reason why it is not a good idea for a servlet to create its own work threads.
The following outlines the various queue settings:
The following section outlines a methodology for configuring the WebSphere Application Server queues. The dynamics of the system can be changed and therefore, the tuning parameters can be changed, by moving resources (for example, moving the database server onto another machine) or providing more powerful resources, such as a faster set of CPUs with more memory. Thus, adjust the tuning parameters to a specific configuration of the production environment.
The first rule of tuning is to minimize the number of requests in WebSphere Application Server queues. In general, it is better for requests to wait in the network (in front of the Web server) than it is for them to wait in WebSphere Application Server. This configuration will result in allowing into the queuing network only requests that are ready to be processed. To accomplish this configuration, specify the queues furthest upstream (closest to the client) to be slightly larger, and specify the queues further downstream (furthest from the client) to be progressively smaller.
The queues in this queuing network are progressively smaller as work flows downstream. When 200 clients arrive at the Web server, 125 requests remain queued in the network because the Web server is set to handle 75 concurrent clients. As the 75 requests pass from the Web server to the Web container, 25 remain queued in the Web server and the remaining 50 are be handled by the Web container. This process progresses through the data source until 25 users arrive at the final destination, the database server. Because, at each point upstream, there is some work waiting to enter a component, no component in this system must wait for work to arrive. The bulk of the requests wait in the network, outside of WebSphere Application Server. This adds stability because no component is overloaded. Routing software like IBM's Network Dispatcher can be used to direct waiting users to other servers in a WebSphere Application Server cluster.
Using a test case that represents the full spirit of the production application (for example, it exercises all meaningful code paths) or using the production application itself, run a set of experiments to determine when the system capabilities are maximized (the saturation point). Conduct these tests after most of the bottlenecks have been removed from the application. The typical goal of these tests is to drive CPUs to near 100% utilization.
Start the initial baseline experiment with large queues. This allows maximum concurrency through the system. For example, start the first experiment with a queue size of 100 at each of the servers in the queuing network: Web server, Web container and data source.
Next, begin a series of experiments to plot a throughput curve, increasing the concurrent user load after each experiment. For example, perform experiments with 1 user, 2 users, 5, 10, 25, 50, 100, 150 and 200 users. After each run, record the throughput (requests per second) and response times (seconds per request).
The curve resulting from the baseline experiments should resemble the typical throughput curve shown as follows:
The throughput of WebSphere Application Server is a function of the number of concurrent requests present in the total system. Section A, the light load zone, shows as the number of concurrent user requests increases, the throughput increases almost linearly with the number of requests. This reflects that, at light loads, concurrent requests face very little congestion within the WebSphere Application Server system queues. At some point, congestion starts to develop and throughput increases at a much lower rate until it reaches a saturation point that represents the maximum throughput value, as determined by some bottleneck in the WebSphere Application Server system. The most manageable type of bottleneck is when the CPUs of the WebSphere Application Server machines become saturated. This is desirable because a CPU bottleneck is remedied by adding additional or more powerful CPUs.
In the heavy load zone or Section B, as the concurrent client load increases, throughput remains relatively constant. However, the response time increases proportionally to the user load. That is, if the user load is doubled in the heavy load zone, the response time doubles. At some point, represented by Section C, the buckle zone, one of the system components becomes exhausted. At this point, throughput starts to degrade. For example, the system might enter the buckle zone when the network connections at the Web server exhaust the limits of the network adapter or if the operating system limits for file handles is exceeded.
If the saturation point is reached by driving the use of the system CPUs close to 100%, move on to the next step. If the CPU is not driven to 100%, there is likely a bottleneck that is being aggravated by the application. For example, the application might be creating Java objects excessively causing garbage collection bottlenecks in Java.
There are two ways to managing application bottlenecks: remove the bottleneck or clone the bottleneck. The best way to manage a bottleneck is to remove it. Use a Java-based application profiler to examine the overall object utilization. Profilers such as Performance Trace Data Visualizer(PTDV), JProbe and Jinsight can be used.
The number of concurrent users at the throughput saturation point represents the maximum concurrency of the application. For example, if the application saturated WebSphere Application Server at 50 users, you might find that 48 users gave the best combination of throughput and response time. This value is called the Max Application Concurrency value. Max Application Concurrency becomes the value to use as the basis for adjusting the WebSphere Application Server system queues. Remember, it is desirable for most users to wait in the network, therefore, decrease queue sizes while moving downstream farther from the client. For example, given a Max Application Concurrency value of 48, start with system queues at the following values: Web server 75, Web container 50, data source 45. Perform a set of additional experiments adjusting these values slightly higher and lower to find the best settings.
The Resource Analyzer can be used to determine the number of concurrent users through the servlet engine thread pool concurrently active threads metric.
In performance experiments, throughput has increased by 10-15% when the Web container transport maximum Keep-Alive are adjusted to match the maximum number of Web container threads.
In many cases, only a fraction of the requests passing through one queue enters the next queue downstream. In a site with many static pages, many requests are fulfilled at the Web server and are not passed to the Web container. In this circumstance, the Web server queue can be significantly larger than the Web container queue. In the previous section, the Web server queue was set to 75 rather than closer to the value of Max Application Concurrency. Similar adjustments need to be made when different components have different execution times.
For example, in an application that spends 90% of its time in a complex servlet and only 10% making a short JDBC query, on average 10% of the servlets are using database connections at any time, so the database connection queue can be significantly smaller than the Web container queue. Conversely, if much of a servlet execution time is spent making a complex query to a database, consider increasing the queue values at both the Web container and the data source. Always monitor the CPU and memory utilization for both the WebSphere Application Server and the database servers to ensure the CPU or memory are not being saturated.
Method invocations to enterprise beans are queued only if the client, which is making the method call, is remote. For example, if the EJB client is running in a separate Java Virtual Machine (another address space) from the enterprise bean. In contrast, if the EJB client (either a servlet or another enterprise bean) is installed in the same JVM, the EJB method runs on the same thread of execution as the EJB client and there is no queuing.
Remote enterprise beans communicate by using the RMI/IIOP protocol. Method invocations initiated over RMI/IIOP are processed by a server-side ORB. The thread pool acts as a queue for incoming requests. However, if a remote method request is issued and there are no more available threads in the thread pool, a new thread is created. After the method request completes the thread is destroyed. Therefore, when the ORB is used to process remote method requests, the EJB container is an open queue, because its use of threads is unbounded. The following illustration depicts the two queuing options of enterprise beans.
When configuring the thread pool, it is important to understand the calling patterns of the EJB client. If a servlet is making a small number of calls to remote enterprise beans and each method call is relatively quick, consider setting the number of threads in the ORB thread pool to a value lower than the Web container thread pool size value.
Resource Analyzer shows a metric called percent maxed used to determine how much of the time all of the configured threads are in use. If this value is consistently in the double-digits, then the ORB could be a bottleneck and the number of threads should be increased.
The degree to which the ORB thread pool value needs to be increased, is a function of the number of simultaneous servlets (that is, clients) calling enterprise beans and the duration of each method call. If the method calls are longer, consider making the ORB thread pool size equal to the Web container size because there is little interleaving of remote method calls. If the servlet makes only short-lived or quick calls to the ORB, the ORB thread pool can be small. Several servlets can potentially reuse the same ORB thread. In this case, the ORB thread pool can be small, perhaps even one-half of the thread pool size setting of the Web container. If the application spends a lot of time in the ORB, configure a more even relationship between the Web container and the ORB.
The capabilities for cloning application servers can be a valuable asset in configuring highly scalable production environments. This is especially true when the application is experiencing bottlenecks that are preventing full CPU utilization of Symmetric Multiprocessing (SMP) servers. When adjusting the WebSphere Application Server system queues in clustered configurations, remember that when a server is added to a cluster, the server downstream receives twice the load.
Two Web container clones are located between a Web server and a data source. It is assumed the Web server, servlet engines and data source (but not the database) are all running on a single SMP server. Given these constraints, the following queue considerations need to be made:
When an SSL connection is established, an SSL handshake occurs. After a connection is made, SSL performs bulk encryption and decryption for each read-write. The performance cost of an SSL handshake is much larger than that of bulk encryption and decryption.
In order to enhance SSL performance, the number of individual SSL connections and handshakes must be decreased.
Decreasing the number of connections increases performance for secure communication through SSL connections, as well as non-secure communication through simple TCP connections. One way to decrease individual SSL connections is to use a browser that supports HTTP 1.1. Decreasing individual SSL connections could be impossible for some users if they cannot upgrade to HTTP 1.1.
It is more common to decrease the number of connections (both TCP and SSL) between two WebSphere Application Server components. The following guidelines help to ensure the HTTP transport of the application server is configured so that the Web server plug-in does not repeatedly reopen new connections to the application server:
Hardware accelerators currently supported by WebSphere Application Server only increase the SSL handshake performance, not the bulk encryption/decryption. An accelerator typically only benefits the Web server because Web server connections are short-lived. All other SSL connections in WebSphere Application Server are long-lived; these connections do not benefit from a hardware device which only accelerates SSL handshakes.
The performance of a cipher suite is different with software and hardware. Just because a cipher suite performs better in software, does not mean it will perform better with hardware. Some algorithms are typically inefficient in hardware (for example, DES and 3DES), however, specialized hardware can provide efficient implementations of these same algorithms.
The performance of bulk encryption and decryption is affected by the cipher suite used for an individual SSL connection. The test software calculating the data used IBM JSSE for both the client and server software, and used no crypto hardware support. The test did not include the time to establish a connection, but the time to transmit data through an established connection. Therefore, the data reveals the relative SSL performance of various cipher suites for long running connections.
Before establishing a connection, the client enabled a single cipher suite for each test case. After the connection was established, the client timed how long it took to write an Integer to the server and for the server to write the specified number of bytes back to the client. Varying the amount of data had negligible effects on the relative performance of the cipher suites. The following chart shows the performance of each cipher suite.
An analysis of the above data reveals the following:
The checklist includes these settings:
Commit option A provides maximum enterprise bean performance by caching database data outside of the transaction scope. Generally, commit option A is only applicable where the EJB container has exclusive access to the given database. Otherwise, data integrity is compromised. Commit option B provides more aggressive caching of Entity EJB object instances, which can result in improved performance over commit option C, but also results in greater memory usage. Commit option C is the most common real-world configuration for Entity EJBs.
The settings of the Activate At and Load At properties govern which commit options are used.
Isolation level also plays an important role in performance. Higher isolation levels reduce performance by increasing row locking and database overhead while reducing data access concurrency. Various databases provide different behavior with respect to the isolation settings. In general, Repeatable Read is an appropriate setting for DB2 databases. Read Committed is generally used for Oracle. Oracle does not support Repeatable Read and will translate this setting to the highest isolation level serializable.
Isolation level can be specified at the bean or method level. Therefore, it is possible to configure different isolation settings for various methods. This is an advantage when some methods require higher isolation than others, and can be used to achieve maximum performance while maintaining integrity requirements. However, isolation cannot change between method calls within a single enterprise bean transaction. A runtime exception will be thrown in this case.
Repeatable Read
This level prohibits dirty reads and nonrepeatable reads, but it allows phantom reads.
Read Committed
This level prohibits dirty reads, but allows nonrepeatable reads and phantom reads.
Read Uncommitted
This level allows dirty reads, nonrepeatable reads, and phantom reads.
The container uses the transaction isolation level attribute as follows:
If the client invokes a bean method from outside a transaction context, the container behaves in the same way as if the Not Supported transaction attribute was set. The client must call the method without a transaction context.
Mandatory
This legal value directs the container to always invoke the bean method within the transaction context associated with the client. If the client attempts to invoke the bean method without a transaction context, the container throws the javax.jts.TransactionRequiredException exception to the client. The transaction context is passed to any enterprise bean object or resource accessed by an enterprise bean method.
Enterprise bean clients that access these entity beans must do so within an existing transaction. For other enterprise beans, the enterprise bean or bean method must implement the Bean Managed value or use the Required or Requires New value. For non-enterprise bean EJB clients, the client must invoke a transaction by using the javax.transaction.UserTransaction interface.
Requires New
This legal value directs the container to always invoke the bean method within a new transaction context, regardless of whether the client invokes the method within or outside a transaction context. The transaction context is passed to any enterprise bean objects or resources that are used by this bean method.
Required
This legal value directs the container to invoke the bean method within a transaction context. If a client invokes a bean method from within a transaction context, the container invokes the bean method within the client transaction context. If a client invokes a bean method outside a transaction context, the container creates a new transaction context and invokes the bean method from within that context. The transaction context is passed to any enterprise bean objects or resources that are used by this bean method.
Supports
This legal value directs the container to invoke the bean method within a transaction context if the client invokes the bean method within a transaction. If the client invokes the bean method without a transaction context, the container invokes the bean method without a transaction context. The transaction context is passed to any enterprise bean objects or resources that are used by this bean method.
Not Supported
This legal value directs the container to invoke bean methods without a transaction context. If a client invokes a bean method from within a transaction context, the container suspends the association between the transaction and the current thread before invoking the method on the enterprise bean instance. The container then resumes the suspended association when the method invocation returns. The suspended transaction context is not passed to any enterprise bean objects or resources that are used by this bean method.
Bean Managed
This value notifies the container that the bean class directly handles transaction demarcation. This property can be specified only for session beans, not for individual bean methods.
Examining Java garbage collection can give insight into how the application is utilizing memory. Garbage collection is a Java strength. By taking the burden of memory management away from the application writer, Java applications are more robust than applications written in languages that do not provide garbage collection. This robustness applies as long as the application is not abusing objects. It is normal for garbage collection to consume anywhere from 5 to 20% of the total execution time of a properly functioning application. If not managed, garbage collection can be one of the biggest bottlenecks for an application, especially when running on SMP server machines.
Use garbage collection to evaluate application performance health. By monitoring garbage collection during the execution of a fixed workload, users gain insight as to whether the application is over-utilizing objects. Garbage collection can even be used to detect the presence of memory leaks.
Use the garbage collection and heap statistics in Resource Analyzer to evaluate application performance health. By monitoring garbage collection, memory leaks and over-use of objects can be detected.
For this type of investigation, set the minimum and maximum heap sizes to the same value. Choose a representative, repetitive workload that matches production usage as closely as possible (user errors and all). It is also important to allow the application to run several minutes until the application state stabilizes.
To ensure meaningful statistics, run the fixed workload until the state of the application is steady. This usually takes several minutes.To see if the application is overusing objects, look in Resource Analyzer at the counters for the JVMPI profiler. The average time between garbage collection calls should be 5 to 6 times the average duration of a single garbage collection. If not, the application is spending more than 15% of its time in garbage collection. Also, look at the numbers of freed, allocated and moved objects.
If the information indicates a garbage collection bottleneck, there are two ways to clear the bottleneck. The most cost-effective way is to optimize the application is to implement object caches and pools. Use a Java profiler to determine which objects to target. If the application cannot be optimized, adding memory, processors and clones might help. Additional memory allows each clone to maintain a reasonable heap size. Additional processors allow the clones to run in parallel.
Memory leaks in Java are a dangerous contributor to garbage collection bottlenecks. They are more damaging than memory overuse because a memory leak ultimately leads to system instability. Over time, garbage collection occurs more frequently until finally the heap is exhausted and Java fails with a fatal Out of Memory exception. Memory leaks occur when an unneeded object has references that are never deleted. This most commonly occurs in collection classes, such as Hashtable, because the table itself always has a reference to the object, even after real references have been deleted.
It is a common complaint that applications crash immediately after being deployed in the production environment. High workload is often the cause. This is especially true for leaking applications where the high workload accelerates the magnification of the leakage and a memory allocation failure occurs.
Memory leak testing relates to magnifying numbers. Memory leaks are measured in terms of the amount of bytes or kilobytes that cannot be garbage collected. The delicate task is to differentiate these amounts from the expected sizes of useful and unusable memory. This task is achieved more easily if the numbers are magnified, resulting in larger gaps and easy identification of inconsistencies. The following is a list of important conclusions about memory leaks:
Repetitive tests can be used at the system level or module level. The advantage with modular testing is better control. When a module is designed so that everything that takes place within the module is kept private and does not create external side effects including memory usage, testing for memory leaks can be much easier. First, the memory usage before running the module is recorded and then a fixed set of test cases are run repeatedly. At the end of the test run, the current memory usage is repeatedly the one previously recorded and checked, if it has changed significantly. Remember, garbage collection must be forced when recording the actual memory usage. To do this, insert System.gc() in the module where you want garbage collection to occur, or use a profiling tool forces this event to occur.
Consider the following when choosing which test cases to use for memory leak testing:
Resource Analyzer helps to determine if there is a memory leak. For best results, repeat experiments with increasing duration, like 1000, 2000, and 4000-page requests. The Resource Analyzer graph of used memory should have a sawtooth shape. Each drop on the graph corresponds to a garbage collection. There is a memory leak if one of the following occurs:
If heap consumption indicates a possible leak during a heavy workload (the application server is consistently near 100% CPU utilization), yet the heap appears to recover during a subsequent lighter or near-idle workload, this is an indication of heap fragmentation. Heap fragmentation can occur when the JVM is able to free sufficient objects to satisfy memory allocation requests during garbage collection cycles, but does not have the time to compact small free memory areas in the heap into larger contiguous spaces.
Another form of heap fragmentation occurs when small objects (less than 512 bytes) are freed. The objects are freed, but the storage is not recovered, resulting in memory fragmentation.
Heap fragmentation can be avoided by turning on the -Xcompactgc flag in the JVM advanced settings command line arguments. The -Xcompactgc ensures that each garbage collection cycle eliminates fragmentation, but with a small performance penalty.
The Java heap parameters also influence the behavior of garbage collection. Increasing the heap size allows more objects to be created. Because a large heap takes longer to fill, the application runs longer before a garbage collection occurs. However, a larger heap also takes longer to compact. Garbage collection also takes longer.
The illustration represents three CPU profiles, each running a fixed workload with varying Java heap settings. In the middle profile the settings are initial and maximum heap size or 128MB. There are four garbage collections. The total time in garbage collection is about 15% of the total run. When the heap parameters are doubled to 256MB, as in the top profile, the length of the work time increases between garbage collections. There are only three garbage collections, but the length of each garbage collection is also increased. In the third profile, the heap size is reduced to 64MB and exhibits the opposite affect. With a smaller heap, both the time between garbage collections and time for each garbage collection are shorter. For all three configurations, the total time in garbage collection is approximately 15%. This illustrates an important concept about the Java heap and its relationship to object utilization. There is always a cost for garbage collection in Java applications.
Run a series of test experiments that vary the Java heap settings. For example, run experiments with 128MB, 192MB, 256MB, and 320MB. During each experiment, monitor the total memory usage. If you expand the heap too aggressively, paging can occur. (Use the vmstat command or the Windows NT/2000 Performance Monitor to check for paging.) If paging occurs, reduce the size of the heap or add more memory to the system. When all the runs are finished, compare the following statistics :
If the heap free settles at 85% or more, consider decreasing the maximum heap size values because the application server and the application are under-utilizing the memory allocated for heap.
Solaris tcp_time_wait_interval Solaris tcp_fin_wait_2_flush_interval Solaris tcp_keepalive_interval
Many other TCP parameters exist; changing them can affect performance in your environment. For more information about tuning the TCP/IP Stack, see the Web site Tuning your TCP/IP Stack and More.
Before the three TCP parameters were changed, the server was stalling during certain peak periods. The netstat command showed that many sockets open to port 80 were in the state CLOSE_WAIT or FIN_WAIT_2.
In both topologies, the Object Request Broker pass-by-reference is selected and the backend database is on its own dedicated machine.
Also, if the processor utilization of the four machines is near 100% a fifth machine could be added. Or, if the Web server box is not running at capacity and the Web container processing is not heavy, try freeing the processors on the four machines by moving to the Topology B.
The same relationship applies to the session manager number of connections.
The MaxAppls setting must be at least as high as the number of connections.
If you are using the same database for session and data sources, MaxAppls needs to
be the sum of the number of connection settings for the session manager and the data sources.
MaxAppls = (# of connections set for data source + # of connections in session manager) x # of clones
After calculating the MaxAppls settings for the WAS database and each of the application databases, ensure that the MaxAgents setting for DB2 is equal to or greater than the sum of all of the MaxAppls.
MaxAgents = sum of MaxAppls for all databases
This section discusses considerations for selecting and configuring the hardware on which the application servers will run.
Allow at least 512MB memory for each processor.
See the white paper WebSphere Application Server Admin Best Practices for Performance and Scalability for more information regarding hostname resolution on the administrative client host.
This section discusses considerations for tuning the operating systems in the server environment.
Note:These two parameters should be used together when tuning WebSphere Application Server on a Windows NT/2000 operating system.
ulimit -n 2000For large SMP machines with clones, issue the following command:
ulimit -unlimited
Use the command ulimit -a to display the current values for all limitations on system resources.
ulimit -n 1024
Use ulimit -a to display the current values for all limitations on system resources.
The server can stall during certain peak periods. If this occurs, The netstat command will show that many of the sockets opened to port 80 were in the CLOSE_WAIT or FIN_WAIT_2 state. Visible delays have occurred for up to four minutes, during which the server did not send any responses, but CPU utilization stayed high, with all of the activity in system processes.
ndd -get /dev/tcp tcp_time_wait_interval ndd -set /dev/tcp tcp_time_wait_interval 60000
The server can stall during peak periods. Using the netstat command indicated that many of the sockets opened to port 80 were in CLOSE_WAIT or FIN_WAIT_2 state. Visible delays have occurred for as many as four minutes, during which the server did not send any responses, but CPU utilization stayed high, with all of the activity in system processes.
ndd -get /dev/tcp tcp_fin_wait_2_flush_interval ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500
ndd -get /dev/tcp tcp_keepalive_interval ndd -set /dev/tcp tcp_keepalive_interval 300000
Customers have reported success with modifying other Solaris TCP parameters, including the following:
tcp_conn_req_max_q tcp_comm_hash_size tcp_xmit_hiwat
Although significant performance differences have not been seen after raising these settings, the system might benefit.
Set via the /etc/system entry:
set semsys:seminfo_semume = 1024
Set via the /etc/system entry:
semsys:seminfo_semopm = 200
HP-UX 11 settings can be modified to significantly improve WebSphere Application Server performance.
chatr +pi64M +pd64M /opt/WebSphere/AppServer/java/bin/PA_RISC2.0/native_threads/java
The command output provides the current operating system characteristics of the process executable.
See the Hewlett Packard Web page for more details about this change.
connect requests dropped due to full queue
ndd -set /dev/tcp tcp_conn_request_max 1024
See the following Hewlett Packard Web page for more details about this change.
Kernel parm | WebSphere Application Server tuning | DB2 tuning | Oracle tuning |
maxdsiz | Not adjusted | Not adjusted | Not adjusted |
maxdsiz_64b | Not adjusted | Not adjusted | Not adjusted |
maxuprc | 512 | ||
maxfiles | 2,048 | ||
maxfiles_lim | 2,048 | ||
nkthreads | 10,000 | ||
max_thread_proc | 2,048 | ||
nproc | 1,028 | ||
nflocks | 8,192 | ||
ninode | 2,048 | ||
nfile | 8,192 | ||
msgseg | 32,767 | ||
msgmnb | 65,535 | ||
msgmax | 65,535 | ||
msgtql | 1,024 | ||
msgmap | 258 | ||
msgmni | 256 | ||
msgssz | 16 | ||
semmni | 512 | 70 | |
semmap | 514 | ||
semmns | 1,024 | 200 | |
semmnu | 1,024 | ||
shmmax | 966,367,642 | 1 GB | |
shmmseg | 16 | 10 | |
shmmni | 300 | 100 |
Refer to the following Hewlett Packard Web page for more information on HP-UX 11 kernel parameters.
The WebSphere Application Server product provides plug-ins for several Web server brands and versions. Each Web server operating system combination features specific tuning parameters that affect the application performance.
This section discusses the performance tuning settings associated with the Web servers.
IHS is a multi-process, single-threaded server. More information see this Web page about Tuning the IBM HTTP Server
The default configuration of the iPlanet Web server, Enterprise Edition provides a single-process, multi-threaded server.
To tell if the Web server is being throttled, consult its perfdump statistics. Look at the following data:
Note: It might be necessary to check the permissions of the sePlugin:
Alleviate the condition by using the ListenBackLog parameter to increase the number of requests IIS keeps in its queue.
MaxPoolThreads, PoolThreadLimit
The IBM HTTP Server is easily configured. The default settings are usually acceptable.
How to view thread utilization: There are two ways to find how many threads are being used under load:
Follow these steps to use IBM HTTP Server server-status:
Each WebSphere Application Server process has several parameters influencing application performance. Each application server in the WebSphere Application Server product comprises an EJB container and a Web container.
Use the WebSphere Application Server administrative console to configure and tune applications, Web containers, EJB containers, application servers and nodes in the administrative domain.
How to see parameter utilization: On UNIX, use the command ps -efl to see the current process priority.
To route servlet requests from the Web server to the Web containers, the product establishes a transport queue between the Web server plug-in and each Web container.
Resource Analyzer displays a metric called Percent Maxed that determines the amount of time that the configured threads are in use. If this value is consistently in the double-digits, then the Web container could be a bottleneck and the number of threads should be increased.
A cache of the requested size is created for each thread. The number of threads is determined by the Web container maximum thread size setting.
Note: A larger cache uses more of the Java heap, so you might need to increase maximum Java heap size. For example, if each cache entry requires 2KB, maximum thread size is set to 25, and the URL invocation cache size is 100; then 5MB of Java heap are required.
Use the Resource Analyzer to view bean performance information.
Security information pertaining to beans, permissions, and credentials is cached. When the cache time out expires, all cached information becomes invalid. Subsequent requests for the information result in a database lookup. Sometimes, acquiring the information requires invoking an Lightweight Directory Access Protocol(LDAP)-bind or native authentication, both of which are relatively costly operations for performance.
Experiment to find the best trade-off for the application, based on the usage patterns and security needs for the site.
The following system properties determine the initial size of the cache primary and secondary Hashtables, which affect the frequency of rehashing and the distribution of the hash algorithms. The larger the number of available hash values, the less likely a hash collision will occur, and more likely a slower retrieval time. If several entries compose a cache Hashtable, creating the table in a larger capacity allows the entries to be inserted more efficiently rather than letting automatic rehashing determine the growth of the table. Rehashing causes every entry to be moved each time it is done.
The Secure Association Service (SAS) feature establishes an SSL connection only if it goes out of the JVM (to another JVM). Therefore, if all the beans are co-located within the same JVM, the SSL used by SAS is not expected to hinder performance.
Use the Services tab and then Edit Properties for Object Request Broker, for the default server or any additional application server configured in the administrative domain, to set these parameters.
WARNING: Pass-by-reference can be dangerous and can lead to unexpected results. If an object reference is modified by the remote method, the change might be seen by the caller.
If the application server expects a large workload for enterprise bean requests, the ORB configuration is critical. Take note of the following properties:
Using JNIReaders requires less memory because a fixed set of threads must be created. It saves time because thread creations are done only once during initialization and threads are never destroyed. The JNIReader is a C-native implementation and should be faster than the default reader thread.
WARNING: Make sure that the JNI library implementation of the JNIReader is in the WebSphere Application Server bin directory. For Intel platform, the library is Selector.dll and for UNIX, it is libSelector.a or libSelector.so. For Unix, if the prefix "lib" is missing, the file should be renamed.
Tuning the JVM
The JVM offers several tuning parameters affecting the performance of WebSphere Application Servers (which are primarily Java applications), as well as the performance of your applications.
In general, increasing the size of the Java heap improves throughput until the heap no longer resides in physical memory. After the heap begins swapping to disk, Java performance drastically suffers. Therefore, the maximum heap size needs to be low enough to contain the heap within physical memory.
The physical memory usage must be shared between the JVM and the other applications, for instance, the database. For assurance, use a smaller heap (for example 64MB, on machines with less memory).
Try a maximum heap of 128MB on a smaller machine (that is , less than 1GB of physical memory), 256MB for systems with 2GB memory, and 512MB for larger systems. The starting point depends on the application.
If performance runs are being conducted and highly repeatable results are needed, set the initial and maximum sizes to the same value. This eliminates any heap growth during the run. For production systems where the working set size of the Java applications is not well understood, an initial setting of one-fourth the maximum setting is a good starting value. The JVM will then try to adapt the size of the heap to the working set of the Java application.
Use the command line property of the default server or any additional application server you configure in the administrative domain to set the JVM parameters:
WebSphere Application Server is tightly integrated with a supported database of your choice. For information about supported database products, see the product prerequisites Web site at www.ibm.com/software/webservers/appserv/library.html. WebSphere Application Server uses the database as a persistent backing store for administration, as well as to store session state and enterprise bean data for the application.
If the application uses WebSphere Application Server session state, JDBC database connection pooling or enterprise beans, pay special attention to how these resources and their database settings are configured within the administrative domain. During WebSphere Application Server installation, a database named WASnn is established, where nn is the release identifier, although a different name can be used. This document assumes WAS40 is used.
For scalability, it is likely the database is established on a separate machine, particularly if clustering is used. This relates to the WebSphere Application Server database, any application database, and the WebSphere Application Server session database (if persistent session is used).
If clones are used, one data source pool exists for each clone. This is important when configuring the database server maximum connections.
Use Resource Analyzer to find the optimal number of pool connections that can reduce values for these numbers. If Percent Used is consistently low, consider decreasing the number of connections in the pool.
On UNIX platforms, a separate DB2 process is created for each connection. This quickly affects performance on systems with low memory, and errors can occur.
Each Entity EJB transaction requires an additional connection to the database specifically to handle the transaction. Be sure to take this into account when calculating the number of data source connections.
Deadlock can occur if the application requires more than one concurrent connection per thread, and the database connection pool is not large enough for the number of threads. Suppose each of the application threads requires two concurrent database connections and the number of threads is equal to the maximum connection pool size. Deadlock can occur when both of the following are true:
To prevent the deadlock in this case, the value set for the database connection pool must be at least one higher, so that at least one of the waiting threads can complete its second database connection, freeing up database connections.
To avoid deadlock, code the application to use, at most, one connection per thread. If the application is coded to require C concurrent database connections per thread, the connection pool must support at least the following number of connections, where T is the maximum number of threads.
T * (C - 1) + 1
The connection pool settings are directly related to the number of connections that the database server is configured to support. If the maximum number of connections in the pool is raised, and the corresponding settings in the database are not raised, the application fails and SQL exception errors are displayed in the stderr.log file.
Note: Prepared statements are optimized for handling parametric SQL statements that benefit from precompilation. If the JDBC driver specified in the data source supports precompilation, the creation of the prepared statement will send the statement to the database for precompilation. Some drivers might not support precompilation. In this case, the statement might not be sent to the database until the prepared statement is executed.
Resource Analyzer can help tune this setting to minimize cache discards. Use a standard workload that represents a typical number of incoming client requests, use a fixed number of iterations, and use a standard set of configuration settings.
Follow these instructions to use the Resource Analyzer:
The best value for Data Source > Connection Pooling > Statement Cache Size is the setting used to get either a value of zero or the lowest value for PrepStmt Cache Discards. This indicates the most efficient number for a typical workload.
Other JDBC parameters
In addition to setting the prepared statement cache size, you can set
other specific properties for JDBC drivers. For example, using Oracle, you can increase the
number of rows to fetch while getting result sets with the following statement:
name="defaultRowPrefetch", value="25"
Enter these types of custom properties on the General tab for the database.
To set BUFFPAGE to a value of n, issue the DB2 command update db cfg for x using BUFFPAGE n and be sure NPAGES is -1 as follows:
db2 <-- go to DB2 command mode, otherwise the following "select" will not work as is connect to x <-- (where x is the particular DB2 database name) select * from syscat.bufferpools (and note the name of the default, perhaps: IBMDEFAULTBP) (if NPAGES is already -1, you are OK and no need to issue following command) alter bufferpool IBMDEFAULTBP size -1 (re-issue the above "select" and NPAGES should now be -1)
An optimization level of 9 causes DB2 to devote a lot of time and all of its available statistics to optimizing the access plan.
For more information, refer to the DB2 documentation and the IBM DB2 Web site.
In order to see if runstats has been done, issue the following command on DB2 CLP:
db2 -v "select tbname, nleaf, nlevels, stats_time from sysibm.sysindexes"
If no runstats has been done, nleaf and nlevels will be filled with -1 and stats_time will have an empty entry "-".
If runstats was done already, the real-time stamp when the runstats was completed will also be displayed under stats_time. If you think the time shown for the previous runstats is too old, do runstats again.
The new setting takes effect immediately.
The following are several metrics that are related to DB2 MinCommit:
For additional information on setting session management parameters, see InfoCenter article about session programming model and environment.
The utilization of this parameter can be seen by enabling the trace on cmm component.
A value of 1 should be used if you want to serially process the messages, that is, only one message bean instance is used to process messages one after another.
A value of 20 will give the best throughput. Increasing beyond this value will not increase the throughput. Based on the message type and the amount of work and the resources available, a value between 10 and 20 should be used for obtaining the maximum message throughput.
The increase in message throughput depends on various factors such as system resources and listener configuration. System resources refers to the number and power of the processors. Listener configuration is the number of sessions utilized and the JMS interactions. Included in the JMS interactions is the contention in sharing the access to the underlying MQ Server manager resources.
Follow these steps:
From the Start menu choose Programs > Administrative Tools > Performance Monitor