db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Workarounds for too many open files?
Date Mon, 28 Nov 2005 19:03:16 GMT
The easiest workaround is to up the open file count - these should
only be needed during the sort.  How to do this is very OS system
specific.  To my knowledge java gives no visibility to this resource.

I believe the
problem is the sort algorithm used to create the index, not the index
itself.  It uses a multi-level merge strategy where each merge group
is a separate file (once Derby has determined that it is going to do
a disk based sort rather than an in-memory sort).

I have not debugged this, other than verifying that upping the open
file count allows the index to be created and the temp files are
cleaned up.  Unlike the the normal open files which have a cache
to limit how many were open at one time, my guess is that sort
just keeps all of the files open as they tend to be first filled and
then drained almost immediately.  Caching the opens are going to
slow down the sort while conserving the open file resource.  Alternate
sort/merge strategies could be used that did not need one file per
merge group.  Also it may be that we should up the size of each merge
pass when dealing with such a big sort.

Not very much work has been done on the performance of a disk based
sort, and especially not on 100gb sorts.  Anyone interested in doing
some development work on derby may want to look at sorts.  It is a
module where one could implement a completely separate implementation
and easily compare/test ones changes without worrying about any other
part of the code.

This is definitely an area that probably can be approved as it has
not changed much since it's orgininal implementation where 100gb
db's were just a dream (I can verify it was developed on machines
where 1 gb of disk space was a luxury).

Lars Clausen wrote:
> On Fri, 2005-11-25 at 09:14, Lars Clausen wrote:
> 
>>Trying to import a 10GB text file (about 50x10^6 entries) into a single
>>Derby table, I got the following error:
>>
>>ij> connect 'jdbc:derby:cdxdb'; ij> elapsedtime on; ij> CALL
>>SYSCS_UTIL.SYSCS_IMPORT_DATA ( null, 'CDX',
>>'URL,IP,MIMETYPE,LENGTH,ARCFILE,OFFSET', '1,2,4,5,6,7',
>>'/home/lc/index-backping.cdx', '`', null, null, 1);
>>ERROR 38000: The exception 'SQL Exception: Exception during creation of
>>file
>>/home/lc/projects/webarkivering/scripts/sql/cdxdb/tmp/T1132842374093.tmp
>>for container' was thrown while evaluating an expression.
>>ERROR XSDF1: Exception during creation of
>>file
>>/home/lc/projects/webarkivering/scripts/sql/cdxdb/tmp/T1132842374093.tmp
>>for container
>>ERROR XJ001: Java exception:
>>'/home/lc/projects/webarkivering/scripts/sql/cdxdb/tmp/T1132842374093.tmp (Too many
open files): java.io.FileNotFoundException'.
> 
> 
> It turns out that this happens during index creation.  I was able to
> import the text file and run selects on it, but when I try to create an
> index:
> 
> ij> select count(*) from cdx;
> 1
> -----------
> 50000000
>  
> 1 row selected
> ELAPSED TIME = 320818 milliseconds
> ij> create index cdxurl on cdx(url);
> ERROR XSDF1: Exception during creation of file
> /home/lc/projects/webarkivering/scripts/sql/cdxdb/tmp/T1132927896412.tmp
> for container
> ERROR XJ001: Java exception:
> '/home/lc/projects/webarkivering/scripts/sql/cdxdb/tmp/T1132927896412.tmp (Too many open
files): java.io.FileNotFoundException'.
> ij>
> 
> Derby creates files in the tmp directory at a rate of about 8 per
> second.  If it doesn't close all of these, it would run out of FDs
> (ulimit 1024) before long.
> 
> I would file a bug report, but db.apache.org isn't responding.
> 
> -Lars
> 
> 
> 


Mime
View raw message