db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Newsham" <jnews...@referentia.com>
Subject RE: slow subqueries
Date Mon, 13 Nov 2006 20:27:47 GMT

> Army wrote:
> > In order to get Derby to recognize the equijoin predicate, the subquery
> > has to appear in the FROM list.  So I created a view for the subquery:
> <snip view>
> > and then I changed the query to select and join with that view:
> <snip>
> > Working off the modified query above I used "optimizer overrides", which
> > were added as part of Derby 10.2, to 1) fix the join order so that the
> > subquery becomes the "inner" table, and 2) to force ("guarantee") a hash
> > join using the equijoin predicate, thereby allowing materialization of
> > the inner table (i.e. of the subquery).  The query to do that is as
> > follows:
> >
> > select count(*) from --DERBY-PROPERTIES joinOrder=FIXED
> >   time, v_hash --DERBY-PROPERTIES joinStrategy=HASH
> >   where time.id = v_hash.id;
> Just one thing to mention as an afterthought: you do not have to create a
> separate view for this to work.  I did that for the sake of clarity, but
> you
> could just as easily write:
> select count(*) from --DERBY-PROPERTIES joinOrder=FIXED
>    time,
>    (select distinct
>      time.id from time, double_sample, band
>     where band.id = double_sample.fk_band_id
>       and double_sample.fk_time_id = time.id and
>       band.id in (11, 12)
>    ) v_hash --DERBY-PROPERTIES joinStrategy=HASH
>    where time.id = v_hash.id;
> Of course after writing all of that I realized that you said your queries
> are
> generated (as opposed to hand-written) and that they "are only
> representative of
> my problem. [...] While the above sample query can be easily modified to
> run
> well, I'm not sure how to do it in the general case."
> So maybe this isn't going to work for you, after all...

Hi Army,

Thanks for the equijoin/subquery example, it makes sense to me now.
Regarding the optimizer override, I should have no problem getting those
hints into my sql statement as I'm generating the sql statements myself, and
they all follow a similar structure.  So on the contrary, your above idea
seems very feasible.  

I have three possible workaround ideas, but I'm going to pursue this one
first, as it seems likely to work reasonably well while not taking too much
development time, which is a critical issue now.  (The other ideas are to
simulate a subquery by iterating over two sorted queries in parallel; and to
use a temporary table for inner query results, using an equijoin with the
outer query).

You previously mentioned a Derby property called
derby.language.maxMemoryPerTable.  I don't see it mentioned in the current
documentation, but I found a jira issue which contains some proposed
documentation (http://issues.apache.org/jira/browse/DERBY-1397).  But it's
still not entirely clear to me.  Is this property only used when deciding
whether to perform a hash join or nested loop join?  Or does it also
influence how much of the hash join is kept in memory, or when that hash
join is spilled to disk?  Essentially, I'd like to know, if using the above
query with optimizer overrides which force a hash join, what is the
threshold for spilling a hash to disk, and is this a tunable parameter?



View raw message