db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@gmail.com>
Subject Re: Diagnosing DERBY-47 and DERBY-713 (queries with IN (?,?) filters)
Date Mon, 02 Oct 2006 20:55:02 GMT
James Synge wrote:

> Yes, though I've been finding it hard to understand the manner in which the
> the query plan is transformed by the optimizer.  For example, how
> an index scan gets inserted above (?) a base table, 

The query plan that you see is generated as part of the "modifyAccessPaths" 
phase of compilation.  *After* the optimizer has found a best access plan for a 
query, it then goes through and calls "modifyAccessPaths" on all of the nodes in 
the query tree.  It is during this phase that the internal nodes (ex. 
IndexToBaseRowNode) are generated.

For tracing purposes, take a look at DMLStatementNode.optimize().  In that 
method you can see a nice breakdown of the different phases of optimization 
(comments here are my own):

// Phase 1: Pre-process (occurs after binding has finished):
resultSet = resultSet.preprocess(getCompilerContext().getNumTables(),
		null, (FromList) null);

// Phase 2: Optimize (see below)
resultSet = resultSet.optimize(getDataDictionary(), null, 1.0d);

// Phase 3: Modify access paths, i.e. generate the nodes required
// to execute the access plan chosen in Phase 2.
resultSet = resultSet.modifyAccessPaths();

If you put a break point at the call to modifyAccessPaths() and then step into 
it, that might help you understan more about how the final query tree is 
generated based on optimizer decisions.

> and more generally, what are all the ways that this kind of alternative 
> approach can be added to the mix and evaluated.

The "alternate approaches" which are evaluated during optimization consist 
primarily of three things: a) join order (which table comes first), b) join 
strategy (hash or nested loop), and c) index usage (which index, if any, are we 
going to use).

For a general description of how the optimizer evaluates and chooses which 
"access path" is best, you might find it helpful to read "Section III" of the 
following two documents:

DERBY-781_v1.html (attached to DERBY-781)
DERBY-805_v5.html (attached to DERBY-805)

These writeups are admittedly biased toward the specific issues to which they 
are attached, but perhaps there's some info there to help you understand how the 
different query plans ("approaches") are evaulated on a more general scale...

> I'm beginning to see that one alternative access path that we could 
> generate would introduce a fake resultset that would be used as the 
> outer table in a nested loop join with the base table (or one of its 
> indices) as the inner table.  What I definitely can't see yet is how 
> to "slip" that in to the optimizer.

I haven't dug into this approach enough to comment one way or the other, but as 
a general knee-jerk reaction I think the place to do this kind of thing would be 
during the pre-processing phase.  If you are able "transform" the 
InListOperatorNode into a meaningful ResultSet during pre-processing, then you 
should ideally be able to feed that to the optimizer and let it (the optimizer) 
do its normal "magic" to figure out what join strategy to use and whether or not 
to use an index.  But again, I say that without having thought about your 
suggested approach in any detail...

Hopefully that helps get you started, but if you have more questions on how the 
overall optimize process works, please do keep asking...

Army


Mime
View raw message