db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Embretsen <John.Embret...@Sun.COM>
Subject OutOfMemoryErrors when testing Derby with DOTS
Date Fri, 27 Jan 2006 13:29:28 GMT
Hi all,

As some of you may have noticed if you have tried testing Derby with
DOTS, or read the document "Using DOTS as Apache Derby System Test"
posted on the Apache Derby web site at
http://db.apache.org/derby/integrate/index.html, it is likely that you
will see OutOfMemoryErrors (OOME) in the logs (derby.log and DOTS
error log) when running (long enough) with default JVM configuration.

I am currently doing some testing using DOTS, and can verify this
unfortunate fact. It is the Derby Network Server that runs out of memory
(details to follow). Since I do not fully accept the "explanation" of
what may be causing this in the above mentioned document, I would like
to share some of my findings with the community, so that we can

a) Determine with certainty the source of the errors: Derby, DOTS,
Java,... ?
b) Figure out why this is happening
c) Find out what we can do to fix this, other than increasing the Java
Heap size ad infinitum.

This is not the shortest E-mail you've seen, but please bear with me, as
this is (IMHO) a non-trivial problem which requires a certain amount of
information for us to be able to find the best solution.


Test setup
--------------------------------------

I've been running DOTS tests with various configurations to reduce the
risk of this being a user error, but here's the main setup I've been
using:

* DOTS (v1.1.1), with mostly "default" settings except a few, such as:
    * DURATION > 48 hours
    * CPU_TARGET = 70 %
    * AUTO_MODE = yes
* Derby 10.1.2.1, Client/Server mode, ClientDriver (client and server on separate hosts)
* Default Derby settings
* Sun's JVM version 1.5.0 (I also see OutOfMemoryErrors using other VM versions)

DOTS has been set up with the help of the documents posted on the above
mentioned web site, and the DOTS User's Guide at
http://ltp.sourceforge.net/dotshowto.php. I am mostly using the test
case ATCJ2, which is "simulating" an online auction application,
although I have also seen the same type of error when using other test
cases.


Symptoms
--------------------------------------

Using a typical heap size (e.g. 128 MB) and size of the Permanent
Generation Space (aka. PermSpace, which probably is the most interesting
part of the heap in this context, as I will explain below) (e.g. 64 MB),
I usually see OutOfMemoryErrors within the first few hours of the test
run.

Here's an example of the first occurrence of an OutOfMemoryError in
derby.log. The Derby Network server was started 2006-01-24 11:09:22.931
GMT, and the DOTS test case was started on the client host a few seconds
later:

2006-01-24 14:11:22.278 GMT Thread[DRDAConnThread_19,5,main] Wrote class org.apache.derby.exe.ac6ca0115bx0108xfc16xe0e7x000007823a688d6
to file ac6ca0115bx0108xfc16xe0e7x000007823a688d6.class. Please provide support with the file
and the following exception information: java.lang.OutOfMemoryError: PermGen space
2006-01-24 14:11:22.279 GMT Thread[DRDAConnThread_19,5,main] (XID = 212876), (SESSIONID =
18), (DATABASE = /export/home2/tmp/je159969/DOTS/db/server/TESTDB), (DRDAID = O19F70EC.D025-939843809132829725{19}),
Cleanup action starting
2006-01-24 14:11:22.279 GMT Thread[DRDAConnThread_19,5,main] (XID = 212876), (SESSIONID =
18), (DATABASE = /export/home2/tmp/je159969/DOTS/db/server/TESTDB), (DRDAID = O19F70EC.D025-939843809132829725{19}),
Failed Statement is: SELECT BID_PRICE,BID_COUNT FROM ITEM WHERE ITEMID = 'ITEM7990'
java.lang.OutOfMemoryError: PermGen space
Cleanup action completed

The test (and the Derby Server) continued to run, but kept throwing
OutOfMemoryErrors, and to a larger degree failed to execute statements
sent by the client. After a while (~2 days), the Network Server was not
able to do much of anything. Within that time, about 800 classes of the
type mentioned in the log excerpt above ended up in
${derby.system.home}, although I did not tell Derby to dump generated
classes.

I was monitoring the server heap using jstat (a Sun JDK monitoring tool,
see http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstat.html ), and
as you can see from the attached graph (PU_jstat-gc-t.png) based on the
jstat output, the PermSpace was filling up rather quickly. After about 3
hours (more accurately 10920 seconds), the first OutOfMemoryError was
thrown. I've attached a graph showing the first 4 hours (14400 seconds)
of the test run, so that you can see in more detail what happened
(PU-first4hrs_jstat-gc-t.png).

I've also seen errors of the type "OutOfMemoryError: Java Heap Space" in
derby.log when running DOTS, but "OutOfMemoryError: PermGen space" is
the (by far) most frequently occurring OOME.

The type of error and Mean Time To Failure (MTTF) of course varies with
your test setup, JVM configuration as well as the hardware you are
using. However, I believe a maximum heap size of 128 MB and maximum
PermSpace of 64 MB is close to the typical default values for JVMs (both
client and server mode) on today's desktops.


Source of errors, Chain of events
--------------------------------------

Since I don't know much about the internals of Derby I'm not very sure
about any of this, but here's what I think, based on the log files and
JVM behavior:

The DOTS test case contains both prepared and unprepared statements. As
you can see from the log excerpt above, the statement which "caused" the
OOME in this case (other statements have been reported as the failing
ones in other test runs) was

SELECT BID_PRICE,BID_COUNT FROM ITEM WHERE ITEMID = 'ITEM7990'

Looking at the DOTS source code for ATCJ2.java, we see that this is the
String getBidItemSQL concatenated with the String "'ITEM7990'".
(You can get the source code for DOTS at
http://sourceforge.net/project/showfiles.php?group_id=3382 )

Here are three other relevant code lines (download DOTS to see full
context):

Statement stmt = null;

stmt = conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE,ResultSet.CONCUR_READ_ONLY);

rs = stmt.executeQuery(getBidItemSQL + "'" + itemID + "'");

The statement and the result set are explicitly closed iff rs.next()
returns true.

I guess that since a PreparedStatement is not being used here, the Derby
engine compiles a new statement like this every time the method doBid()
is being called (which is quite often), since the item id is varying quite
a lot.

The test case also use PreparedStatements. Some of these are "normal",
meaning that the statement strings contain question mark placeholders,
and that setXXX and getXXX methods are used to set/get the values. The
odd thing is that the PreparedStatements (at least some of them) are not
reused, but closed, nulled, and re-prepared. Although this is not
recommended practice performance-wise, I would like to think that this
should not be the cause of a Derby memory leak, although DERBY-210
describes issues with the Derby Server when such statements are _not_
explicitly closed by the user.

The third kind of statement used in the DOTS test case is
PreparedStatements with some question mark placeholders, but where an
item id is appended to the end this statement string every time it is
used. Here is an example:

/** SQL for updating bid related columns in ITEM table */
private final static String updateBidItemSQL =
         "UPDATE ITEM SET BID_PRICE = ?,BID_COUNT=? WHERE ITEMID = ";

I don't know the rationale behind doing it this way. It may be just "sloppy
code", or it may be intentional.

Now, it seems to me that the sum of all this is that lots and lots of
different statements are being compiled and executed by the derby engine
in a relatively short period of time, which involves a whole lot of
generated byte code. This might explain why we run out of PermSpace...

As far as I understand (I am not an expert in this area), the PermSpace
part of the heap is where the actual classes (not the object instances)
are stored during JVM execution. It seems that during such test runs
Derby generates new classes faster than the JVM is able to garbage
collect them, thus eventually running out of PermSpace.

My limited experiments with somewhat larger heaps and PermSpaces
indicate that the problem does not go away, it just takes a bit longer
to fill up the PermSpace.

The problem is probably worse when using a 1.3.x JVM (I did not try
this), since, according to
http://www.informit.com/guides/content.asp?g=java&seqNum=98&rl=1 , 1.3.x
versions of the JVM do not unload classes from its PermSpace.


Now what?
--------------------------------------

I am interested in finding out what is causing these errors,
and what can be done to avoid them in the future.

It might be that changing the source code of DOTS will make the problem
go away (for example by changing the way PreparedStatements are used, or
by going through the code and making sure all statements and result sets
are explicitly closed); I have not tried this yet.

However, shouldn't Derby be able to handle even this type of code in a
more graceful manner? Or are the DOTS test cases so "poorly" designed
that we cannot expect any relational database system to run them
successfully for more than a few hours with default JVM settings?

I have noticed that there are several open or recently resolved
JIRA-issues treating OutOfMemoryErrors (e.g. DERBY-210, DERBY-443,
DERBY-557, DERBY-756), but I have not (yet) been able to link either of
those directly with the symptoms I see using DOTS.

I have not run any DOTS tests recently using the Derby trunk, but perhaps
others out there have tried this? Do you get OutOfMemoryErrors?

Has anyone tried DOTS (relatively long test runs) with other databases?

More testing and closer inspection of some of the DOTS source code is on
my agenda - but if any of you have any ideas, or comments, feel free to
speak out :)

Thanks for reading this,


-- 
John

Mime
View raw message