This topic illustrates that
solving a performance problem
is an iterative process and shows how to troubleshoot performance
problems.
About this task
Solving
a performance problem is frequently an iterative
process of:
This process is often iterative because when one bottleneck
is removed the performance is now constrained by some other part of
the system. For example, replacing slow hard disks with faster ones
might shift the bottleneck to the CPU of a system.
Measuring
system performance and collecting performance data
- Begin
by choosing a benchmark, a standard set of operations
to run. This benchmark exercises those application functions experiencing
performance problems. Complex systems frequently need a warm-up period
to cache objects, optimize code paths, and so on. System performance
during the warm-up period is usually much slower than after the warm-up
period. The benchmark must be able to generate work that warms up
the system prior to recording the measurements that are used for performance
analysis. Depending on the system complexity, a warm-up period can
range from a few thousand transactions to longer than 30 minutes.
- If the performance problem under investigation only occurs when
a large number of clients use the system, then the benchmark must
also simulate multiple users. Another key requirement is that the
benchmark must be able to produce repeatable results. If the results
vary more than a few percent from one run to another, consider the
possibility that the initial state of the system might not be the
same for each run, or the measurements are made during the warm-up
period, or that the system is running additional workloads.
- Several
tools facilitate benchmark development. The tools range
from tools that simply invoke a URL to script-based products that
can interact with dynamic data generated by the application. IBM® Rational® has
tools that can generate complex interactions with the system under
test and simulate thousands of users. Producing a useful benchmark
requires effort and needs to be part of the development process.
Do not wait until an application goes into production to determine
how to measure performance.
- The benchmark records throughput
and response time results in
a form to allow graphing and other analysis techniques. The performance
data that is provided by WebSphere® Application Server
Performance Monitoring Infrastructure (PMI) helps to monitor and tune
the application server performance. See the information on why use
request metrics to learn more about performance data that is provided
by WebSphere Application Server. Request
metrics allows a request to be timed at WebSphere Application
Server component boundaries, enabling a determination of the time
that is spent in each major component.
Locating
a bottleneck
Consult the
following scenarios and suggested solutions:
- Scenario: Poor
performance occurs with only a single user.
Suggested
solution: Utilize request metrics
to determine how much each component is contributing to the overall
response time. See the information on data you can collect with request
metrics to learn more about request metrics. Focus on the component
accounting for the most time. Use the Tivoli Performance Viewer to
check for resource consumption, including frequency of garbage collections.
You might need code profiling tools to isolate the problem to a specific
method.
- Scenario: Poor performance
only occurs with multiple users.
Suggested
solution: Check to determine if any systems have high CPU, network
or disk utilization and address those. For clustered configurations,
check for uneven loading across cluster members.
- Scenario: None
of the systems seems to have a CPU, memory,
network, or disk constraint but performance problems occur with multiple
users.
Suggested solutions: - Check that work is reaching
the system under test. Ensure that
some external device does not limit the amount of work reaching the
system. Tivoli® Performance Viewer helps determine
the number of requests in the system.
- A thread dump might
reveal a bottleneck at a synchronized method
or a large number of threads waiting for a resource.
- Make
sure that enough threads are available to process the work
both in IBM HTTP Server, database, and the application
servers. Conversely, too many threads can increase resource contention
and reduce throughput.
- Monitor garbage collections with Tivoli Performance
Viewer or the verbosegc option of your Java virtual
machine. Excessive garbage collection can limit throughput.
Eliminating a bottleneck
Consider
the following methods to eliminate a bottleneck:
- Reduce the
demand
- Increase resources
Reducing the demand for resources can be accomplished
in several ways. Caching can greatly reduce the use of system resources
by returning a previously cached response, thereby avoiding the work
needed to construct the original response. Caching is supported at
several points in the following systems:
- IBM HTTP
Server
- Command
- Enterprise bean
- Operating system
Application code profiling can lead to a reduction in
the CPU demand by pointing out hot spots you can optimize. IBM Rational and
other companies have tools to perform code profiling. An analysis
of the application might reveal areas where some work might be reduced
for some types of transactions.
Change tuning parameters to
increase some resources, for example, the number of file handles,
while other resources might need a hardware change, for example, more
or faster CPUs, or additional application servers. Key tuning parameters
are described for each major WebSphere Application Server
component to facilitate solving performance problems. Also, the performance
advisors page can provide advice on tuning a production system under
a real or simulated load.
Some
critical sections of the application and server code require synchronization
to prevent multiple threads from running this code simultaneously
and leading to incorrect results. Synchronization preserves correctness,
but it can also reduce throughput when several threads must wait for
one thread to exit the critical section. When several threads are
waiting to enter a critical section, a thread dump shows these threads
waiting in the same procedure. Synchronization can often be reduced
by: changing the code to only use synchronization when necessary;
reducing the path length of the synchronized code; or reducing the
frequency of invoking the synchronized code.