Monday, December 22, 2008

Basic tools for monitoring garbage collection, and more on JVM memory analysis (draft)

(Draft)
I was having a discussion online. C is better than Java. Java is better than C. You know, the conversation you have everyday on the web and come up with the same conclusions. When people mention the benefits of Java, they normally mention WORA (write once, run anywhere) and Java's garbage collection. The Java developer does not have to worry about his application crashing and what-not. And it just occurred to me. I see more crashes with Java servers and systems than I could ever imagine with systems that were built with C++ or C. I don't think I have worked at a Java shop where they don't periodically reboot their Java application servers. Does this happen with Apache/PHP driven applications? Shrug?. I don't want to get into that debate, but it is the reality in the J2EE/Java world that the Java virtual machine is not immune to crashes or the dreaded 'Out of Memory' Errors. But like everything in life, there are no guarantees, right?. The Java developer is supposed to know what he or she is doing? I am posting this entry to find out a little bit more on why this happens and what is going on with the garbage collection at the time, also a couple of tools to help monitor the JVM.

From wikipedia:

"Java uses an automatic garbage collector to manage memory in the object lifecycle. The programmer determines when objects are created, and the Java runtime is responsible for recovering the memory once objects are no longer in use."

Here is an example C program with a very obvious error.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
char *buffer = (char *) malloc(4 * sizeof(char));
buffer[0] = 'a';
buffer[1] = 'b';
buffer[2] = 'c';
buffer[3] = '\0';
buffer[1000099] = 'd';
printf("%s\n", buffer);
free(buffer);
return -1;
}

Here is the output:
Segmentation fault

Obviously, this is a simple application. Imagine that you are writing a C based web application server. In the millions of millions of lines of code, you didn't properly handle HTTPS requests for clients in France (I am using this as a horrible example) and you have an error just like this one. What happens? The server crashes and you have to call somebody in the middle of the night to reboot it. Does something similar happen in the J2EE world? It actually may happen more often than you might think.

Here is an example of poor Java programming. I am using the String concatenation operator, which will create N many new strings.

public CrashJVM test_SmallString1() throws Exception {

final int size = 1000;
stime();
printMemoryStats("test small string, begin");
for(int i = 0; i < size; i++) {
this.data_append += data_small1;
}

etime();
printMemoryStats("test small string, done");
return this;

}


This is one method, a setter method in my simple bean class. Let's say that
If we ignore that Java does caching, String interning of our String data, we might end up with a String that is 200k chars in size. And creating all of those new objects gets expensive very quickly.

Here is code that I used to crash the JVM due to out of memory errors:

public CrashJVM test_SmallString1() throws Exception {

final int size = 1000;
stime();
printMemoryStats("test small string, begin");
for(int i = 0; i < size; i++) {
this.data_append += data_small1;
}

etime();
printMemoryStats("test small string, done");
return this;

}

public CrashJVM [] test_SmallObjects1() throws Exception {
stime();
printMemoryStats("test small objects, begin");
final int size = 100000;
CrashJVM [] items = new CrashJVM[size];
for(int i = 0; i < size; i++) {
items[i] = new CrashJVM();
}
Thread.sleep(100);
printMemoryStats("test small objects, end");
for(int i = 0; i < size; i++) {
//System.out.println(items[i]);
}
etime();
printMemoryStats("null");
return items;
}

I reduced the amount of load that the JVM was under, so I could get a snapshot of the heap usage. Here is a chart on what happens during the various garbage collections.

Running
6.084: [GC 6.084: [DefNew: 896K->64K(960K), 0.0037010 secs] 896K->196K(5056K), 0.0037720 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
6.309: [GC 6.309: [DefNew: 960K->64K(960K), 0.0047180 secs] 1092K->348K(5056K), 0.0047850 secs] [Times: user=0.00 sys=0.01, real=0.00 secs]
6.610: [GC 6.610: [DefNew: 960K->64K(960K), 0.0054940 secs] 1244K->456K(5056K), 0.0055600 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
6.979: [GC 6.979: [DefNew: 960K->64K(960K), 0.0041030 secs] 1352K->626K(5056K), 0.0041770 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
7.190: [GC 7.190: [DefNew: 960K->64K(960K), 0.0030750 secs] 1522K->742K(5056K), 0.0031420 secs] [Times: user=0.00 sys=0.01, real=0.00 secs]
10.850: [GC 10.850: [DefNew: 960K->18K(960K), 0.0013450 secs] 1638K->759K(5056K), 0.0014110 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
************************

The output is generated with the 'verbosegc' and other JVM options:
(Note: The duplicated 'verbosegc' options gives more information about the garbage collecting)

java -verbosegc -verbosegc -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps CrashJVM -Xms24m -Xmx32m > gc_anal.txt

There are 1800 data points in the output file and in the image. Each GC line in the gc_anal analysis file contains GC information about a minor garbage collection.
[GC [<collector>: <starting occupancy1> -> <ending occupancy1>, <pause time1> secs] 
<starting occupancy3> -> <ending occupancy3>, <pause time3> secs]

"If the incremental collection of the tenured generation cannot be completed before tenured generation is full, a major collection occurs and is indicated in the verbosegc output by the 'MSC' in the output."


<starting occupancy1> is the occupancy of the young generation
before the collection
<ending occupancy1> is the occupancy of the young generation
after the collection
<starting occupancy3> is the occupancy of the entire heap
before the collection
<ending occupancy3> is the occupancy of the entire heap
after the collection

see http://java.sun.com/docs/hotspot/gc1.4.2/example.html

For this simple example, the chart is not that interesting. During the first 20 seconds, there is a Thread.sleep(2000) call. After that, we allocate many new objects, especially string objects and wait for the JVM to stop working. It is interesting that we are getting close to the maximum heap size of 32MB just based on this simple application.

Simple Server Application

The first 'String/Object' test was not that complex. I wrote the most basic simple HTTP server and tested the JVM with large objects as well as modified how often the client would connect to the server.


public void run() {
running = true;
String line;
try {
BufferedReader bufReader = new BufferedReader(new InputStreamReader(in));
while(running) {
// read in a line from the client
line = bufReader.readLine();
if (line == null)
break;
// and write out the reversed line
System.out.println("[server/" + line.length() + "]" + line);
if (line.length() == 0)
break;
}
// Write a html response back
StringBuffer buf = new StringBuffer();
buf.append("HTTP/1.1 200 Ok\r\n");
buf.append("Server: Apache-Test\r\n");
buf.append("Connection: close\r\n");
buf.append("Content-Type: text/html\r\n");
buf.append("\r\n");
...
...
...
out.println(buf);


this.printMemoryStats("incoming request");
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (out != null) out.close();
if (in != null) in.close();
client.close();
} catch (IOException e2) {;}
System.out.println("[server] closing connection");
} // End of Try - Catch - Finally

}

The code above represents the 'run' method in the server thread. Once a client connects to the server, the server returns with a valid HTTP response. The HTML output is like a tabulated report you might find with a Struts or Spring JSP application. For each row in the report, the following bean is instantiated. Essentially, it is your run on the mill Java POJO bean.
public class SimpleReportBean implements Serializable {
private static final long serialVersionUID = 3415594062848249479L;

private String someNo = "234234...";
private Date sigDate = new Date();
private String userId = "AB9303...";

private BigDecimal purchaseAmount = new BigDecimal(0);
private BigDecimal cPaid = new BigDecimal(0);
private Date bDate = new Date();
private Date dPropDate = new Date();
private Date rDate = new Date();
private String userTitle = "Very Important";

private String mailingAddr1 = "1023 Scary Hollow Ln";
private String mailingAddr2 = "Next to the dumpster, APT 030303030K";
private String mailingCity = "Austin";
private String mailingState = "Texas";
...
...



The garbage collection chart above is output based on light load. I used 'curl' to generate a request every 2 of seconds. There are 10 rows in the report.



The chart above depicts output under heavy load. I removed the delay between requests. There are 500 requests and I increased the number of rows in the report to 100. It is not reaching the maximum heap memory at all, but it is minor garbage collection much more rapidly.



Sun just recently add a useful application to the 1.6 toolset, the heap analysis tool or 'hat'. If you generate the heap profile binary output, you can run this tool on the output file to give you better insight into to what is happening to the heap. What are objects are being created? There is a histogram of the total number of allocated objects and the total size. There is also a sophisticated javascript query language for finding trouble objects. The hat tool reads the input file and then launches a HTTP server application. I used the following script to analyze my GC output.

#!/bin/sh
javac CrashJVM.java
FORMAT=,format=b

# Add profiling and verbose garbage collection
java -verbosegc -Xrunhprof:file=dump.txt${FORMAT} Main -Xms24MB -Xmx32MB > d.txt
# To run the server:
jhat dump.txt




In the figure above, we are looking at the heap histogram in the browser on port 7000.



Here is one variation of the Java code used to test against:

import server.BetterReportBean;
public class Main {
public static void main(String [] args) throws Exception {
System.out.println("Running");
BetterReportBean bean [] = new BetterReportBean[3000];
for (int i = 0; i < 3000; i++) {
bean[i] = new BetterReportBean();
}
Thread.sleep(2000);
for (int i = 0; i < 3000; i++) {
bean[i].toString();
}
System.out.println("Done");

}
}

With the better report bean, we used constant strings and referenced them in our class fields. We did this and made a reference to these constants as opposed to creating new String, Date or BigDecimal objects. Essentially, we avoided creating a bunch of new objects that we didn't need to. Notice the interesting statistics a little further down.

With a main class and the Better Report Bean, class server.BetterReportBean
36 garbage collections were performed.

Here is some output from the garbage colection report:
...
[GC 1277K->390K(5056K), 0.0006050 secs]
[GC 1286K->399K(5056K), 0.0005540 secs]
[GC 1294K->407K(5056K), 0.0005580 secs]
[GC 1303K->416K(5056K), 0.0009580 secs]
[GC 1311K->424K(5056K), 0.0006540 secs]
[GC 1320K->431K(5056K), 0.0007520 secs]
[GC 1327K->431K(5056K), 0.0012980 secs]
...
...
Time to garbage collect:
(* 39 0.00070)
0.027 seconds.

The histogram data from the heap analysis tool shows a total size of:
647,190

With the normal class, Simple Report Bean, there
were 468 garbage collections.

Here is some output from the garbage collection report:
...
GC 15539K->14179K(20844K), 0.0034280 secs]
[GC 15587K->14221K(20844K), 0.0033610 secs]
[GC 15629K->14269K(20844K), 0.0033300 secs]
[GC 15677K->14311K(20844K), 0.0033950 secs]
[GC 15719K->14323K(20844K), 0.0033050 secs]
...
...
Time to garbage collect = 1.55 seconds
(* 470 0.0033050 )

The histogram data from the heap analysis tool shows a total size of:
1,516,712 bytes

Improvement
With a small set of changes to this object, we decreased the amount of memory allocated by half. We decreased the number of garbage collections from 470 minor collections to 40. And it took 0.027 seconds to do the garbage collection as opposed to 1.5 seconds.

public class BetterReportBean implements Serializable {
public static final String a = "23423...";
public static final String b = "AB...";
public static final String c = "Very Important";
public static final String d = "1023 Scary Hollow Ln";
public static final String ca = "CA";
public static final BigDecimal bd = new BigDecimal(0);
public static final Date dd = new Date();
private String someNo = a;
private Date sigDate = dd;
private final String userId = b;
...
...
...

Here is the the full source for the typical bean object and a better one with more use of constants.
Simple Report Bean (basic POJO)
Better Report Bean

Additional Tools (jconsole and IBM's Diagnostic Tool for Java GC)

JConsole is provided with the 1.5+ JDK. The screenshot depicts one of the jconsole views. You simply need to type jconsole at the OS command-line prompt and you are given the option to connect to a particular running Java application process id. I can see where this tool is useful for debugging local or dev J2EE servers.



The Diagnostic Tool for Java Garbage Collector is a tool provided by IBM for monitoring the Websphere server. Websphere will output a custom GC statistic output, typically to a file called native_stderr.log . The diagnostic tool can be used to analyze the file and generate the following application. And even though it is amazingly verbose, you might be able to interpret the raw output. Here is example XML output.

<af type="tenured" id="111" timestamp="Dec 11 08:37:03 2008" intervalms="873.459">
<minimum requested_bytes="131088" />
<time exclusiveaccessms="0.237" />
<tenured freebytes="43965512" totalbytes="176996864" percent="24" >
<soa freebytes="43693384" totalbytes="175227392" percent="24" />
<loa freebytes="272128" totalbytes="1769472" percent="15" />
</tenured>
<gc type="global" id="111" totalid="111" intervalms="876.833">
<refs_cleared soft="0" threshold="32" weak="123" phantom="0" />
<finalization objectsqueued="31" />
<timesms mark="166.795" sweep="7.041" compact="0.000" total="174.089" />
<tenured freebytes="101955256" totalbytes="176996864" percent="57" >
<soa freebytes="98864376" totalbytes="173457408" percent="56" />
<loa freebytes="3090880" totalbytes="3539456" percent="87" />
</tenured>
</gc>
<tenured freebytes="101824168" totalbytes="176996864" percent="57" >
<soa freebytes="98733288" totalbytes="173457408" percent="56" />
<loa freebytes="3090880" totalbytes="3539456" percent="87" />
</tenured>
<time totalms="177.606" />
</af>



Resources

http://java.sun.com/docs/hotspot/gc1.4.2/example.html - Diagnosing a Garbage Collection problem
Most Comprehensive Java Perf Guide - http://www.javaperformancetuning.com/
http://oreilly.com/catalog/javapt/chapter/ch04.html - More good performance tips.
https://hat.dev.java.net/ - Heap analysis tool. This tool is included with JDK6.

Java technology, IBM style: Garbage collection policies, Part 1

http://jvmnotebook.googlecode.com/svn/trunk/misc/heap_analysis/ - Browse the SVN repository.

GC Parse Python Script, tool used to parse the GC minor collection output

http://jvmnotebook.googlecode.com/files/heap_analysis.tar.gz - Full source download.

Addendum (gnuplot and regex)

I always feel that charts and graph are great visuals for displaying simple or complex concepts. I only know the surface level details of gnuplot, but here are some notes on creating the images above. I used a simple python script to analyze the GC statements generated by the JVM verbosegc output and then created a tabulated data file that can used with gnuplot.



I normally create a shell script to set all the gnuplot commands and launch the application:

#!/bin/sh
# Also see:
# http://www.duke.edu/~hpgavin/gnuplot.html
#
# Use 'eog' eye of gnome to view the images

OUTPUT_IMG_FILE=image_gc_analysis.png
INPUT_PLOT_FILE=image_gc_analysis.txt

# Output these commands to a temp file and then run gnu plot
echo "set terminal png
set output '${OUTPUT_IMG_FILE}'
set title 'GC results'
set size 1,1
set key left top
set autoscale
set xlabel 'gc_iter'
set ylabel 'gc_main_paus_time'
plot '${INPUT_PLOT_FILE}' using 1:6 title 'GC Results'
"
> gnuplot_tmp_cmd.tmp

# Run the gnu plot command
gnuplot gnuplot_tmp_cmd.tmp > /dev/null

# For future reference, add a command and title to compare results
# E.g.
# '/tmp/data2' using 10 with lines title 'Benchmark 2',


On Regex
In the gc_parse.py python application, I used a couple of lines of regex to find the GC patterns generated by the verbose gc JVM output.

def get_gc_stats(line):
# Example Output java 1.5:
# 20.049: [GC 20.049: [DefNew: 1941K->137K(1576K), 0.0014153 secs] 11194K->11117K(1984K), 0.0014628 secs]
# Example Output java 1.6:
# 20.049: [GC 28.200: [DefNew: 1921K->137K(1984K), 0.0006890 secs] 23030K->21247K(27320K), 0.0007550 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

# Group1 = First Iter Value
# Group2 = Second 'GC' Iter value
regex_begin_iter = '^(\S*):\s*\[GC\s*(\S*):\s*'
# Group 3 = Young Gen Start Occupancy
# Group 5 = Young Gen End Occupancy
regex_def_new = '\[DefNew:\s*(\S*)(K|M)\-\>(\S*)(K|M)\((.*)$'
regex_full_str = '%s%s' % (regex_begin_iter, regex_def_new)
pattr = re.compile(regex_full_str)
# Group 1 = Main Start Occupancy
# Group 2 = Main End Occupancy
pattr_sub = re.compile('^(.*) (\S*)(K|M)\-\>(\S*)(K|M)\((.*)$')
...
...

------

No comments: