db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Synge" <james.sy...@gmail.com>
Subject Re: Diagnosing DERBY-47 and DERBY-713 (queries with IN (?,?) filters)
Date Mon, 02 Oct 2006 19:52:10 GMT
> From: Bryan Pendleton <bpendleton@amberpoint.com>
> > I'd like to fix DERBY-47 and DERBY-713 (essentially the same bug), and
> > so have been working to understand what is going wrong.
> Great! Welcome to the community!


> > as bytecode.  Can someone suggest a means of debugging the execution
> > phase (or getting a human readable form of the code)?
> I don't know the answer to this, but I think it's a great place to start.
> Right now, I think that you can basically work with query plan dumps,
> as described at http://wiki.apache.org/db-derby/PerformanceDiagnosisTips
> and at http://db.apache.org/derby/docs/10.2/tuning/ttundepth33391.html

I've used the query plan dumping, and also gathered the optimizer
trace.  I didn't need to disable the optimizer timeout because that
apparently only kicks in if you have more than N (6?) tables in your
query, and in my examples I have at most 2.

The problem in this case seems to be rooted in the fact that the IN list
predicate is used to perform an index scan.  I'm sure that there are
cases where it makes perfect sense (e.g. where the IN list predicate
is evaluated after a lot of other filtering has been done), but this isn't one.

> It seems like working at the query plan level is appropriate for DERBY-47
> and DERBY-713; that is, it seems to me that if we can concisely express
> the current query plan that Derby uses, and can concisely express the
> desired query plan, we will be well on the way to describing the changes
> that are needed.
> Is this the direction you were heading?

Yes, though I've been finding it hard to understand the manner in which the
the query plan is transformed by the optimizer.  For example, how
an index scan gets inserted above (?) a base table, and more generally,
what are all the ways that this kind of alternative approach can be added
to the mix and evaluated.

I'm beginning to see that one alternative access path that we could generate
would introduce a fake resultset that would be used as the outer table
in a nested loop join with the base table (or one of its indices) as the inner
table.  What I definitely can't see yet is how to "slip" that in to
the optimizer.


View raw message