HP

HPjmeter 4.1 User's Guide

English
  HPjmeter 4.1 User's Guide > Chapter 4 Monitoring Applications   

Diagnosing Errors When Monitoring Running Applications

Issues or symptoms you might notice and how to identify problems and remedy them.

Identifying Unexpected CPU Usage by Method

To identify methods with unexpected CPU usage, use the Java Method HotSpots display.

This metric may require a relatively long time to stabilize, and is tuned for large, multi-CPU systems.

To determine if the CPU usage is unexpected, you need to know and understand how the application works.

If this metric shows a large percentage of CPU time spent in just a few methods listed at the top, it indicates that a performance problem may exist or there is room for improvement.

Typically, when the top entry is represented by a single-digit percentage, you should not see application CPU performance issues unless the entry describes an obscure method that you did not expect to see.

Viewing the Application Load

Use the Heap Monitor display.

This metric shows how busy the application is. It checks to see whether it is doing lots of allocations, which typically corresponds to the load level, or whether it is idle.

When you select a coarse granularity of the view (1 to 24 hours), and assuming that there is no memory leak, you will notice the overall change of behavior in heap size and garbage collection pattern. This setting can help you understand the correlation between the application load and the pressure on the heap.

If it seems a significant amount of time is spent in garbage collection, (gray in selected areas of the display), it means that the heap was not adequately sized — it was too small for the load on the application at that time. This behavior may also mean that the load was too high for the given hardware and software configuration.

Checking for Long Garbage Collection Pauses

Use the Heap Monitor display and select the shortest time interval.

Most garbage collections take several seconds at most to execute. Very large heaps, however, may take up to several minutes.

Although the display does not show numerical values for the garbage collection duration, you can look for extra wide garbage collection bars, which correspond to garbage collection pauses. They will likely cause transient service-level objective (SLO) violations.

If intermittent, very long garbage collections are a potential problem in a given environment, you can select appropriate garbage collection algorithms with a JVM option. Refer to your JVM documentation.

Checking for Application Paging Problems

Use the Heap Monitor display, selecting a short time interval.

If multiple consecutive garbage collections take extraordinary time to run, this may indicate an excessive paging problem, called thrashing, where the physical memory available to the application is too small for the specified maximum size.

You can verify this using a system tool like HP GlancePlus. The possible remedies for thrashing:

  • Decrease the maximum heap size, which corresponds to a decrease of the maximum load supported by the application.

  • Eliminate other load on the system.

  • Install more physical memory.

Identifying Excessive Calls to System.gc()

Use the Heap Monitor display.

When a high level of detail is selected (1 to 20 minutes), this metric can help detect if calls to System.gc() are creating a performance problem. When your application calls this method, you see that the heap size does not go to the local maximum before a garbage collection happens. However, from time to time, the JVM may automatically invoke garbage collection in such circumstance, too.

A rule of thumb is that if over half of all garbage collections seem to happen when the heap is not full, then it may mean explicit calls to System.gc() are occurring. Remove explicit calls from your application, either by modifying the code or by using an appropriate JVM option. These calls rarely improve the overall performance of the application.

Reviewing the Percentage of Time Spent in Garbage Collection

The percentage of time your application spends in garbage collection can help you identify potential problems. Use the % Time Spent in Garbage Collection display to view this information. For details, see Percentage of Time Spent in Garbage Collection.

In this example, you see a fairly normal application behavior.

  • A low flat graph

  • A low average value, represented by the red line

Figure 4-1 Example Metric: Percentage of Time Spent in Garbage Collection When Application Behavior is Normal

Example Metric: Percentage of Time Spent in Garbage Collection When Application Behavior is Normal

This next example shows an application with a potential memory leak.

  • A rising trend in the percentage of time spent in garbage collection.

Figure 4-2 Example Metric: Percentage of Time Spent in Garbage Collection When Application Shows Potential Memory Leak

Example Metric: Percentage of Time Spent in Garbage Collection When Application Shows Potential Memory Leak

Checking for Proper Heap Sizing

Efficiencies in program performance can be obtained by allocating the optimal amount of memory for the heap according to the needs of the program and the operation of the JVM garbage collection routine. Checking activity in the heap and comparing to garbage collection frequency and duration can help you determine optimal heap size needed for best performance from your application.

See the following sections for metrics that give insight into these aspects of the system:

For detailed GC analysis, run your application with -Xverbosegc and use the HPjmeter GC viewer to analyze GC activity in the heap. For additional details on memory usage, run your application with -agentlib:hprof.

Confirming Java Memory Leaks

HPjmeter can automatically detect Java memory leaks, alerting you about the issue before the application crashes.

After running for some time, Java applications can terminate with an out-of-memory error, even if they initially had an adequate amount of heap space. The direct cause of this error is the inability of the garbage collector to reclaim enough heap memory to continue.

The root cause of this problem is that some objects, due to the design or to coding errors, remain live. Such objects, called lingering objects, tend to accumulate over time, clogging the heap and causing multiple performance problems, and eventually leading to the application crash. Although the nature of this phenomenon is different than memory leaks in C/C++, Java lingering objects are also commonly called memory leaks.

A more subtle problem that can cause an out-of-memory error occurs when the Permanent Generation area of memory becomes full of loaded classes. Running -agentlib:hprof can provide class loading data that can help to reveal whether this problem is occurring.

Determining the Severity of a Memory Leak

When HPjmeter automatically reports memory leaks, use the Garbage Collections and Heap Monitor displays to help you visualize object retention. Object retention may indicate Java memory leak problems.

Selecting large view ports, one hour or longer, allows you to easily see the trend in the heap size after garbage collection and its increase rate, if any. You can visually assess the danger associated with the memory leak, and estimate, based on the knowledge of the total heap size, when the application would run out of memory, whether it is a matter of minutes, hours, or days.

The Heap Monitor and Garbage Collections data help you determine if an observed application termination could have been caused by running out of heap space.

The Heap Monitor gives the idea about the overall heap limit, while the Garbage Collections display shows the size of heap after each garbage collection.

If the heap size shown by Garbage Collections converges toward the heap limit, the application will, or has, run out of memory.

Identifying Excessive Object Allocation

Use the Allocated Object Statistics by Class display.

This metric shows the most frequently allocated object types.

Verify your understanding of the application memory use pattern with this metric. Resolving this problem requires knowledge of the application.

Identifying the Site of Excessive Object Allocation

Use the Allocating Method Statistics display.

This metric shows the methods that allocate the most objects. This metric is useful when you decide to decrease heap pressure by modifying the application code.

Identifying Abnormal Thread Termination

Look for the Abnormal Thread Termination Alert, which allows you to see a list of the uncaught exceptions.

You can then use the Thread Histogram to investigate possible problems.

Terminated threads appear as a discontinued row. HPjmeter is unable to tell if the thread was prematurely terminated or just completed normally. This is a judgment call, but in many cases the analysis does not require extensive application knowledge.

If just one thread from a group of similar threads (by name, creation time and characteristics) terminates, it is very likely that it was stopped.

A thread terminating abnormally does not necessarily bring the application down, but can frequently cause the application to be unresponsive.

Identifying Multiple Short-lived Threads

Use the Thread Histogram.

Threads are relatively costly to create and dispose; usually a more efficient solution is through thread pooling.

Identifying Excessive Lock Contention

Use the Thread Histogram.

There is no simple answer to how much lock contention is excessive. A multi-threaded application normally exhibits some lock contention. With the Thread Histogram display, you can identify the threads that clash over the same lock by visually comparing the pattern for the lock contention (the red area) across several threads.

In situations when a small number of threads are involved in contention, you need to compare a red area from one thread to the red and orange area in another thread. This may help identify the involved locks, but it requires understanding the threading of the given application.

Some applications, such as WebLogic from BEA, use basic-level synchronization for multiple threads reading from a single socket. This typically appears as very high lock contention for the involved threads. If there are N threads using this pattern, the average lock contention for them will be (N-1)/N*100%. Even though this kind of lock contention seems to be harmless, it unnecessarily stresses the underlying JVM and the operating system kernel, and usually does not bring a positive net result. There is a WebLogic option that can fix this. See the release notes for your version of WebLogic for details.

You may attempt to decrease the level of lock contention by decreasing the number of involved threads. If this does not help, or if it decreases the application throughput, you should deploy the application as a cluster. At the same time, large servers, for example 8-way, can be re-partitioned as a group of smaller virtual servers.

If the lock contention appears in your application, rather than a third-party application, you may be able to change your code to use different algorithms or to use different data grouping.

Identifying Deadlocked Threads

HPjmeter allows you to enable an alert that identifies deadlocked threads. You can then use the Thread Histogram to get more specific information.

Deadlocked threads represent a multi-threaded program error.

Thread deadlock is the most common cause of application unresponsiveness. Even if the application is still responding, expect SLO violations.

Restart the application that experienced the deadlock, and fix the application code to avoid future deadlocks.

Identifying Excessive Thread Creation

Use the Thread Histogram to identify excessive thread creation.

The Thread Histogram shows the number of threads existing in the observed period of time. By selecting the smallest time interval, one minute, you can see an estimate of the number of simultaneously active threads. Most operating systems have a limited capacity for the number of threads a single process may create. Exceeding this capacity may cause a crash. Although HPjmeter cannot indicate whether or not this will happen, the intensity of the new thread creation may suggest deeper analysis is needed in this area.

Adjusting kernel parameters may increase the threads per process limit.

Sometimes an application may exhibit threads leak, when the number of dynamically created threads is unconstrained and grows all the time. Tuning kernel parameters is unlikely to help in this situation.

Related Topics

Identifying Excessive Method Compilation

HPjmeter allows you to enable an alert that identifies excessive method compilations. You can then use the Method Compilation Count display to view the specific method or methods that caused the alert.

Symptoms of excessive method compilation include poor application response because the JVM is spending time re-compiling methods.

The Method Compilation Count display shows the methods that have been compiled, sorted by number of compilations.

Figure 4-3 Example Metric: Method Compilation Count

Example Metric: Method Compilation Count

Excessive method compilation is rare, especially with newer releases of JVM. Sometimes, due to a defect in the JVM, a particular method can get repeatedly compiled, and de-optimized. This means that the execution of this method oscillates between interpreter and compiled code. However, due to the stress on the JVM, such phenomena are typically much more costly than just running this method exclusively in interpreted mode.

Repeated compilation problems may result in SLO violations and will affect some or all transactions. Read the JVM Release Notes to learn how to disable compilation of selected methods for the HP Java HotSpot Virtual Machine.

Identifying Too Many Classes Loaded

Use the Loaded Classes display.

This display can be used to determine if the pool of classes loaded into memory stabilizes over time to a constant value, which is normal.

In some JVMs, like HotSpot, loaded classes are located in a dedicated memory area, called the Permanent Generation, which is typically much smaller that the whole heap. If the application repeatedly loads new classes, either from external sources, or by generating them on the fly, this area overfills and the application abnormally terminates. This will correspond with the value of this metric constantly growing up to the point of failure.

If you find that the application terminates because the number of loaded classes is too large for the area of memory dedicated to this purpose, you can increase the heap area for the class storage by using a JVM option. However, if changing the dedicated memory size will not help the situation because the number of the loaded classes is unconstrained, this is an application design issue that needs to be solved.

NOTE: An overflow of the area dedicated to class storage can cause the application to terminate, sometimes with an OutOfMemoryError message, even if there is plenty of space available on the heap. Consult your JVM documentation on how to increase the area dedicated for class storage, if necessary.