db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@gmail.com>
Subject Re: slow subqueries
Date Mon, 13 Nov 2006 17:26:12 GMT
Jim Newsham wrote:
> Thanks for the tips... both on getting compilation time, and getting the
> plan without running the full query.  Does the query plan really show up in
> the derby log?

It did for me, and I've always thought that was where it was supposed to go. 
Maybe I'm missing something about your environment, though...

> To turn my query into an equijoin subquery, would I just change "where x in
> (...subquery...)" to "where x = (...subquery...)"?  I believe I could do
> that above, if I change the outer query to return distinct results.

In Derby an equijoin predicate has to exist between two result sets in the FROM 
list of the query.  So I don't think changing "where x in " to "where x = " will 
be enough.

I did some (very quick) playing with the following query that you posted in an 
earlier thread:

select count(*) from time where time.id =
   (select distinct
     time.id from time, double_sample, band
    where band.id = double_sample.fk_band_id
      and double_sample.fk_time_id = time.id and
      band.id in (11, 12));

In order to get Derby to recognize the equijoin predicate, the subquery has to 
appear in the FROM list.  So I created a view for the subquery:

create view v_hash as
   select distinct
     time.id from time, double_sample, band
    where band.id = double_sample.fk_band_id
      and double_sample.fk_time_id = time.id and
      band.id in (11, 12);

and then I changed the query to select and join with that view:

select count(*) from time, v_hash where time.id = v_hash.id;

When I did that, Derby chose a query plan that put "time" as the inner table and 
then did a nested loop join with a table scan on "time"--but that's not what you 
wanted.  Which brings us to your next question:

> But if I understand what you wrote below correctly, this is a required
> condition but not a guarantee that a hash join will be used.  If the above
> transformation makes sense (including for performance) and there is a way to
> guarantee hash join materialization including caching to disk, I would love
> to hear about it.

Working off the modified query above I used "optimizer overrides", which were 
added as part of Derby 10.2, to 1) fix the join order so that the subquery 
becomes the "inner" table, and 2) to force ("guarantee") a hash join using the 
equijoin predicate, thereby allowing materialization of the inner table (i.e. of 
the subquery).  The query to do that is as follows:

select count(*) from --DERBY-PROPERTIES joinOrder=FIXED
   time, v_hash --DERBY-PROPERTIES joinStrategy=HASH
   where time.id = v_hash.id;

Note that a "return" is required after each Derby override.

At that point the query plan showed a hash join in which the view "v_hash" was 
materialized into memory--which is, I think, what you are looking for.

Are you working with 10.2, and if so would something like this work for you?  I 
did all of this with zero rows, so I have no idea how or if this will work for 
tables with *millions* of rows in them.  Might be worth a shot, though...

> In the end, we'd like to stick with Derby if we can, because it is pure Java
> and embedded, and this is a real big plus for deployment within a desktop
> application such as ours.  I'm hoping that I can come up with a feasible
> workaround for the subquery performance issue we have, and hope that in the
> near future Derby can make my hack obsolete.

If we're lucky the above workaround will make things more bearable from a 
performance standpoint.  It's a bit of a pain in that you would have to rewrite 
all of your queries, but I'll leave it up to you decide whether or not it's 
worth it...


View raw message