db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard G. Hash" <Richard.H...@openspirit.com>
Subject RE: Question about how Derby uses it's "tmp/" sub-directory - and what to do when it goes missing!
Date Wed, 08 Sep 2010 16:42:43 GMT
Bryan,
Thanks for the information.  I will submit a bug as soon as we can pin it down to something
more useful than "some problem occurs" ;-)

In our case I have not been leaning towards resource constraints for a couple of reasons.
 We ran into the max-file-descriptor issue a long time ago, and it was pretty obvious that
was the problem. We typically run on Linux and set the max open file limit to the hardlimit
or 1024, whichever is bigger.

In the case of the SELECT...ORDER BY, it would only have returned a couple dozen rows, so
it's not like these are huge sorts. In the case of our cascade DELETEs, there are typically
only a couple of rows with 2k clob columns that would be cascade deleted, so they aren't huge
either.

I have been learning towards a threading issue because I can issue these exact same queries
over and over at later points in time and they are fine. I have yet to be able to replicate
it with heavily threaded tests however, with a dozen client processes with dozens of threads
each, so I am just not sure...

Richard Hash
OpenSpirit Corporation

=========================================================
From: Bryan Pendleton (bpen...@gmail.com)
Sent: Sep 4, 2010 8:10:00 am
Subject: Question about how Derby uses it's "tmp/" sub-directory - and what to do when it
goes missing!

> I basically don't have a good understanding of what the "tmp/" directory
> is used for, or when it's created or goes away.

I suspect that you are encountering some sort of resource exhaustion
situation, which is provoking a Derby bug of some sort.

The "tmp" directory is used by the low-level storage subsystem of Derby,
for purposes such as:
  - holding intermediate results during large external merge-sort runs
  - holding intermediate results during query processing

So, for example, if you are issuing a query which causes Derby to sort
a large amount of data, large enough that the sort can't be entirely
performed in memory, you cause an "external merge-sort", which uses
temporary files to hold the data. A GROUP BY or ORDER BY could cause
this, or a complicated join may do it if Derby's optimizer chooses a
merge-join strategy.

Another case is when Derby's optimizer chooses a hash join strategy, but
the table which is chosen to be hashed into memory is too large to fit
in memory, and the hash table overflows to disk.

I think there are also cases with scrollable updatable cursors which
can cause in-memory hash tables to need to overflow to disk.

The types of resources that you might run out of that are relevant to
these processing choices are: memory and open file descriptors.

I know that Derby has some theoretical problems in its handling of
open file descriptors for truly enormous sorts. See, for example:
https://issues.apache.org/jira/browse/DERBY-1679
The bottom line is: I think you are encountering a bug in Derby, and
I think you should file it in the Derby bug-tracking system and start
trying to gather as much information as you can, in order to get the
best chance of identifying and resolving it.
http://db.apache.org/derby/DerbyBugGuidelines.html
In the meantime, you *might* be able to work around your problem by either
a) giving Derby substantially more memory, to reduce its need to perform
external merge-sorts
b) checking on whether you are hitting a file descriptor limit. If this
is a Unix-based system (Linux/Solaris/MacOSX/etc.), you may be able to
configure the system to give your process more file descriptors, which
could avoid a "too many open files" error if that's what's causing this.

Hope this helps,

bryan


Mime
View raw message