db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Newsham" <jnews...@referentia.com>
Subject slow subqueries (was: StackOverflowError)
Date Fri, 10 Nov 2006 01:46:26 GMT

Hi everyone,

The stack overflow problem went away after I upgraded to 10.2.1.6.  There's
no longer an error running the query, however, the query is very slow.  To
be honest, I haven't run the query to completion, as my computer makes a
prohibitively annoying sound when the CPU is pegged and I have to kill derby
after a number of minutes.  But I know the query runs for many, many minutes
without completing.  Why do queries with subqueries perform so poorly?

The following query completes almost instantly.  I suppose Derby simplifies
the subquery internally to "band.id = 11":

ij(SAMPLEBASE)> select count(*) from time where time.id in (select distinct
time.id from time, double_sample, band where band.id =
double_sample.fk_band_id and double_sample.fk_time_id = time.id and band.id
in (11));

This query "does not complete":

ij(SAMPLEBASE)> select count(*) from time where time.id in (select distinct
time.id from time, double_sample, band where band.id =
double_sample.fk_band_id and double_sample.fk_time_id = time.id and band.id
in (11, 12));

I've been searching/reading around and trying to understand a little about
how Derby works and how to check the statement plan.  I've read through the
tuning document.  Unfortunately if a query doesn't complete, I can't get the
statement plan.  

Is the entire inner query being re-evaluated for every candidate record in
the outer query?  I read somewhere that inner queries may not be
materialized if they are larger than a certain threshold.  But if the outer
query is also very large, then this seems a poor strategy.  If the inner
result set is too large for memory, couldn't it be cached to disk?  

But a much better optimization would be for Derby to notice that the outer
variable (time.id) and the result of the inner query (time.id) are both
indexed/unique (already sorted?), and then to internally generate two
sub-resultsets which are processed together in parallel, instead of using a
very expensive nested loop.

The above queries are only representative of my problem.  I have a generic
API which uses a sql database under the covers.  Sql queries are generated
dynamically.  The queries only involve a small number of tables, but will
typically involve subqueries, unions, intersects.  While the above sample
query can be easily modified to run well, I'm not sure how to do it in the
general case.

Any feedback or responses to the above comments are welcome.

Regards,
Jim

> -----Original Message-----
> From: Rajesh Kartha [mailto:kartha02@gmail.com]
> Sent: Monday, November 06, 2006 6:04 PM
> To: Derby Discussion
> Subject: Re: StackOverflowError
> 
> 
> Hello Jim,
> 
> Is it possible to try this query out with the latest GA release 10.2.1.6 ?
> 
> http://db.apache.org/derby/derby_downloads.html#Latest+Official+Release
> 
> There have been some fixes in the related code generation areas for 10.2:
> DERBY-766, DERBY-1714 etc.
> 
> Also, can post the schema that can  reproduce this issue ?
> 
> Regards,
> Rajesh
> 
> 
> 
> 
> 
> Jim Newsham wrote:
> 
> >Hi everyone,
> >
> >I thought the problem would go away if I gave Derby a better written
> query,
> >so I fixed my query generator to be a bit smarter.  Unfortunately, I
> still
> >get a stack overflow error.  Here's the new query:
> >
> >ij(SAMPLEBASE)> select count(*) from time where time.id in (select
> time.id
> >from time, double_sample, band where band.id = double_sample.fk_band_id
> and
> >double_sample.fk_time_id = time.id and (band.id = 39 or band.id = 55));
> >1
> >-----------
> >ERROR 38000: The exception 'java.lang.StackOverflowError' was thrown
> while
> >evaluating an expression. SQLSTATE: XJ001: Java exception: ':
> >java.lang.StackOverflowError'.
> >
> >
> >This looks like an ugly bug for such a simple query.  I didn't find any
> bug
> >in jira which seemed to relate to this.  Is this a known bug?
> >
> >Any advice on how to work around the problem is appreciated.
> >
> >I'm using Derby 10.1.3.1.
> >
> >Thanks,
> >Jim
> >
> >P.S.  I just included the outer query ("select count(*) from..") to
> >reproduce the problem in ij.  My program actually uses jdbc, executes the
> >inner query, and calls ResultSet.last().  The result is the same, a
> >StackOverflowError.  Here's the stack trace I get in my app:
> >
> >Caused by: org.apache.derby.client.am.SqlException: The exception
> >'java.lang.StackOverflowError' was thrown while evaluating an expression.
> >SQLSTATE: XJ001: Java exception: ': java.lang.StackOverflowError'.
> >        at org.apache.derby.client.am.ResultSet.completeSqlca(Unknown
> >Source)
> >        at
> >org.apache.derby.client.net.NetResultSetReply.parseFetchError(Unknown
> > Source)
> >        at
> >org.apache.derby.client.net.NetResultSetReply.parseCNTQRYreply(Unknown
> >Source)
> >        at
> >org.apache.derby.client.net.NetResultSetReply.readPositioningFetch(Unknow
> n
> >Source)
> >        at
> >org.apache.derby.client.net.ResultSetReply.readPositioningFetch(Unknown
> >Source)
> >        at
> >org.apache.derby.client.net.NetResultSet.readPositioningFetch_(Unknown
> >Source)
> >        at org.apache.derby.client.am.ResultSet.getRowCount(Unknown
> Source)
> >        at org.apache.derby.client.am.ResultSet.lastX(Unknown Source)
> >        at org.apache.derby.client.am.ResultSet.last(Unknown Source)
> >        at
> >com.referentia.sdf.monitor.samplebase.derby.QueryDataSet.getSize(QueryDat
> aSe
> >t.java:139)
> >
> >
> >
> >
> >>-----Original Message-----
> >>From: Jim Newsham [mailto:jnewsham@referentia.com]
> >>Sent: Monday, November 06, 2006 11:21 AM
> >>To: 'Derby Discussion'
> >>Subject: StackOverflowError
> >>
> >>
> >>Any reason why I should get a stack overflow error with the following
> >>query?
> >>
> >>Yes, I know the query is a bit odd... it's not hand-written.  The query
> >>generator could be optimized.  Nevertheless... is the stack overflow
> here
> >>considered a bug or a limitation?  If limitation, what specifically is
> the
> >>limitation?
> >>
> >>
> >>ij(SAMPLEBASE)> select count(*) from time where time.id in (select
> time.id
> >>from time, double_sample, band where band.id = double_sample.fk_band_id
> >>and
> >>double_sample.fk_time_id = time.id and band.id = 57 union select id from
> >>time where 1 = 0);
> >>1
> >>-----------
> >>ERROR 38000: The exception 'java.lang.StackOverflowError' was thrown
> while
> >>evaluating an expression. SQLSTATE: XJ001: Java exception: ':
> >>java.lang.StackOverflowError'.
> >>
> >>Thanks,
> >>Jim
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
> 




Mime
View raw message