db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@gmail.com>
Subject Re: slow subqueries
Date Mon, 13 Nov 2006 17:26:12 GMT
Jim Newsham wrote:
> Thanks for the tips... both on getting compilation time, and getting the
> plan without running the full query.  Does the query plan really show up in
> the derby log?

It did for me, and I've always thought that was where it was supposed to go. 
Maybe I'm missing something about your environment, though...

> To turn my query into an equijoin subquery, would I just change "where x in
> (...subquery...)" to "where x = (...subquery...)"?  I believe I could do
> that above, if I change the outer query to return distinct results.

In Derby an equijoin predicate has to exist between two result sets in the FROM 
list of the query.  So I don't think changing "where x in " to "where x = " will 
be enough.

I did some (very quick) playing with the following query that you posted in an 
earlier thread:

select count(*) from time where time.id =
   (select distinct
     time.id from time, double_sample, band
    where band.id = double_sample.fk_band_id
      and double_sample.fk_time_id = time.id and
      band.id in (11, 12));

In order to get Derby to recognize the equijoin predicate, the subquery has to 
appear in the FROM list.  So I created a view for the subquery:

create view v_hash as
   select distinct
     time.id from time, double_sample, band
    where band.id = double_sample.fk_band_id
      and double_sample.fk_time_id = time.id and
      band.id in (11, 12);

and then I changed the query to select and join with that view:

select count(*) from time, v_hash where time.id = v_hash.id;

When I did that, Derby chose a query plan that put "time" as the inner table and 
then did a nested loop join with a table scan on "time"--but that's not what you 
wanted.  Which brings us to your next question:

> But if I understand what you wrote below correctly, this is a required
> condition but not a guarantee that a hash join will be used.  If the above
> transformation makes sense (including for performance) and there is a way to
> guarantee hash join materialization including caching to disk, I would love
> to hear about it.

Working off the modified query above I used "optimizer overrides", which were 
added as part of Derby 10.2, to 1) fix the join order so that the subquery 
becomes the "inner" table, and 2) to force ("guarantee") a hash join using the 
equijoin predicate, thereby allowing materialization of the inner table (i.e. of 
the subquery).  The query to do that is as follows:

select count(*) from --DERBY-PROPERTIES joinOrder=FIXED
   time, v_hash --DERBY-PROPERTIES joinStrategy=HASH
   where time.id = v_hash.id;

Note that a "return" is required after each Derby override.

At that point the query plan showed a hash join in which the view "v_hash" was 
materialized into memory--which is, I think, what you are looking for.

Are you working with 10.2, and if so would something like this work for you?  I 
did all of this with zero rows, so I have no idea how or if this will work for 
tables with *millions* of rows in them.  Might be worth a shot, though...

> In the end, we'd like to stick with Derby if we can, because it is pure Java
> and embedded, and this is a real big plus for deployment within a desktop
> application such as ours.  I'm hoping that I can come up with a feasible
> workaround for the subquery performance issue we have, and hope that in the
> near future Derby can make my hack obsolete.

If we're lucky the above workaround will make things more bearable from a 
performance standpoint.  It's a bit of a pain in that you would have to rewrite 
all of your queries, but I'll leave it up to you decide whether or not it's 
worth it...

Army


Mime
View raw message