HP

HPjmeter 4.1 User's Guide

English
  HPjmeter 4.1 User's Guide   

Chapter 5 Profiling Applications

HPjmeter allows you to process profile data from Java virtual machines.

Separating the profile data collection step from the analysis step has the following advantages:

  • The data analysis can be done at a different time and on a different platform than was used to run the application. For example, it can be done on a desktop system or on a laptop.

  • A non-interactive profiling agent will often impose less overhead than an interactive one.

  • The profile data files obtained naturally facilitate comparison of different runs or creation of a history of performance improvements.

The -Xeprof profiling option, available for the HP-UX HotSpot™ VM, was specifically designed to produce profile data files for HPjmeter. This option works well to capture performance data for viewing, and now can be accessed during a program run, as well as when starting an application.

-Xeprof focuses primarily on performance problems that characterize large server applications. Its relatively low overhead allows you to collect performance data such as the delay caused by lock contention, actual CPU time used by Java methods, and actual profiler overhead.

Using the -Xeprof switch on HP-UX 11.31:

For best results when using the -Xeprof switch on HP-UX 11.31 on Integrity systems, you should run Java 1.5.0.14 or later, or 6.0.02 or later, because the thread timing data generated by earlier releases of Java can be inaccurate on 11.31.

How to tell when the thread timing data is off: Because Java can generate an eprof data file with no errors or other indication of a problem in this situation, you may not know the file is inaccurate until you try to open it with HPjmeter. Then, HPjmeter will either refuse to load the file, or it will load the file, but display unusual results.

If HPjmeter refuses to load the file, it will display an error message such as

Number format error at line NNN. Cannot continue.

If HPjmeter does load the file, you will see unexpected and inaccurate results in the metric displays. The unexpected results can be seen most easily by examining the Threads Histogram. If you see many threads spending all their time in an unexpected state, such as "Unknown" or "Lock Contention", then you are experiencing the problem with inaccurate data. Update your Java installation to one of the versions mentioned above to correct the problem.

Although many features of HPjmeter are available only when -Xeprof is used to capture the profile data, useful profile data can be obtained from almost any JVM. No special compilation or preprocessing of the application code is needed, and you do not need to have access to the source code to get the profiling data for a Java program.

This guide also presents information on running the JVM with -agentlib:hprof , which also provides numerous statistics, some of which are especially useful for detailed profiling of memory usage).

Profiling Overview

Profiling an application means investigating its runtime performance by collecting metrics during its execution. One of the most popular metrics is method call count - this is the number of times each function (method) of the program was called during a run. Another useful metric is method clock time - the actual time spent in each of the methods of the program. You can also measure the CPU (central processing unit) time, which directly reflects the work done on behalf of the method by any of the computer's processors. This does not take into account the I/O, sleep, context switch, or wait time.

Generally, a metric is a mapping which associates numerical values with program static or dynamic elements such as functions, variables, classes, objects, types, or threads. The numerical values may represent various resources used by the program.

For in-depth analysis of program performance, it is useful to analyze a call graph. Call graphs capture the “call” relationships between the methods. The nodes of the call graph represent the program methods, while the directed arcs represent calls made from one method to another. In a call graph, the call counts or the timing data are collected for the arcs.

Tracing

Tracing is one of two methods discussed here for collecting profile data. Java virtual machines use tracing with reduction. Here is how it works: the profile data is collected whenever the application makes a function call. The calling method and the called method (sometimes called “callee”) names are recorded along with the time spent in the call. The data is accumulated (this is “reduction”) so that consecutive calls from the same caller to the same callee increase the recorded time value. The number of calls is also recorded.

Tracing requires frequent reading of the current time (or measuring other resources consumed by the program), and can introduce large overhead. It produces accurate call counts and the call graph, but the timing data can be substantially influenced by the additional overhead.

Sampling

In sampling, the program runs at its own pace, but from time to time the profiler checks the application state more closely by temporarily interrupting the program's progress and determining which method is executing. The sampling interval is the elapsed time between two consecutive status checks. Sampling uses “wall clock time” as the basis for the sampling interval, but only collects data for the CPU-scheduled threads. The methods that consume more CPU time will be detected more frequently. With a large number of samples, the CPU times for each function are estimated quite well.

Sampling is a complementary technique to tracing. It is characterized by relatively low overhead, produces fairly accurate timing data (at least for long-running applications), but cannot produce call counts. Also, the call graph is only partial. Usually a number of less significant arcs and nodes will be missing.

See also Data Sampling Considerations.

Tuning Performance

The application tuning process consists of three major steps:

  • Run the application and generate profile data.

  • Analyze the profile data and identify any performance bottlenecks.

  • Modify the application to eliminate the problem.

In most cases you should check if the performance problem has been eliminated by running the application again and comparing the new profile data with the previous data. In fact, the whole process should be iterated until reasonable performance expectations are met.

To be able to compare the profile data meaningfully, you need to run the application using the same input data or load (which is called a benchmark) and in the same environment. See also Preparing a Benchmark.

Remember the 80-20 rule: in most cases 80% of the application resources are used by only 20% of the program code. Tune those parts of the code that will have a large impact on performance.

There are two important rules to remember when modifying programs to improve performance. These might seem obvious, but in practice they are often forgotten.

  • Don't put performance above correctness.

    When you modify the code, and especially when you change some of the algorithms, always take care to preserve program correctness. After you change the code, you'll want to test its performance. Do not forget to test its correctness. You don't have to perform thorough testing after each minor change, but it is certainly a good idea to do this after you're done with the tuning.

  • Measure your progress.

    Try to keep track of the performance improvements you make. Some of the code changes, although small, can cause great improvements. On the other hand, extensive changes that seemed very promising may yield only minor improvements, or improvements that are offset by a degradation in another part of the application. Remember that good performance is only one of the factors determining software quality. Some changes (for example, inlining) may not be worth it if they compromise code readability or flexibility.



[1] Virtual CPU times, unless on HP-UX; some platforms report clock times

[2] Metric values estimated only

[3] Requires thread=y (no color-coding or start/stop times available for threads)

[4] Requires inlining=disable or running the VM in interpreted mode

[5] N/A (not applicable) means the option is irrelevant to the particular feature.

[6] * (asterisk) means this metric is a combination of one or more measures.