db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Thread Interruption in large Database
Date Mon, 16 Jun 2008 20:20:59 GMT
PhilCope wrote:
> I have a large cloudscape 5 database (over 5million records) and I have found
> that when a ResultsSet includes all or many of these records, the initial
> call to ResultsSet.next() takes a very long (but finite) time (I would
> estimate about 10-15mins) . I can (indeed have) arrange the code so that
> this call occurs in a separate java thread which can be interrupted from the
> UI thread. BUT, as you may know, java threads that are set to as interrupted
> continue to execute until either application code or some "system" calls
> actually check the interrupted state of the running thread.
> 
> So, given this background info, my questions on Derby are 
> 
> 1. Have any significant performance improvements been made such that, for
> databases of this size, migrating from cloudscape to Derby would provide a
> significantly better response time ?
> 
> 2. If not, are there any improvements to the responsiveness of the
> timeconsuming call to .next() to the setting of the interrupted flag on the
> current thread in Derby ?
> 
> Thanks 
> 
> Phil Cope
The issue is not really the timeconsumming call of .next() or not 
necessarily the size of the result set, it is just that when you call 
next it has to wait until query processing is ready
to return the 1st row.  In some cases derby can return the 1st row 
before completing processing of the entire query.  For instance I 
believe if you just did a simple select of all the rows from your
5 million row table you would see that the 1st row comes back very 
quickly.  In other cases it may do a lot of processing before it even
gets to the 1st row (imagine a query with no key that required the
db to process every row in the db and only the last row in the table
actually would be returned).  In other cases the semantics of the query
require the db to pr

Can you post the query, it may help people to give you suggestions.  If 
possible derby tries to stream results out as it gets them, but there 
are queries where all the rows have to be seen and processed before the
first row can be returned.  The simplest example is a query with an 
order by at the end.  If there is no index that provides the ordering
of the order by then derby will process all the query, and throw all the
rows in the sorter and sort them all and then give you the first row 
back.  Sometimes this order by behavior can be worked around by creating
an index on the exact keys in the same order as the order by.  Also note
that while not necessary, the current derby/cloudscape sorter algorithm 
will not
return the 1st row of the sort before it has finished sorting all the 
rows.

As queries get more complicated it may be harder and harder for derby to
return a row "early".


Mime
View raw message