db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@gmail.com>
Subject Re: [jira] Commented: (DERBY-47) Some possible improvements to IN optimization
Date Tue, 19 Dec 2006 00:12:59 GMT
James Synge (JIRA) wrote:
> The basic question is "when" should I replace the IN predicate?  The two basic
> choices would seem to be during preprocessing or during getNextDecoratedPermutation.

Very good question.  I guess I don't really know what the answer here is.  As 
you say, the first way seems like it'd be easier to implement, so maybe that's a 
good one to start with.  At the very least that will allow you to familiarize 
yourself with the preprocessing/binding/rewrite code.  After that you could then 
look at adding logic to determine when to use the VTI and when to fall back to 
normal IN-list processing.

> The former would likely be simpler to implement, but would my force solution
> to always be used (which, given the poor cost estimates produced due to this
> bug, might be a good thing).

Just to be clear, can I confirm that when you say "poor cost estimates" you are 
talking about cost estimates like the following, which I see when I do:

   select * from admin.changes where id in (1, 20000)

The cost estimate that I see for this (when running against the db created in 
your test program) is:

Project-Restrict ResultSet (2):
Number of opens = 1
Rows seen = 20000
Rows filtered = 19998
restriction = true
projection = false
	constructor time (milliseconds) = 0
	open time (milliseconds) = 0
	next time (milliseconds) = 0
	close time (milliseconds) = 0
	restriction time (milliseconds) = 0
	projection time (milliseconds) = 0
	optimizer estimated row count:         1500.38
	optimizer estimated cost:        22099.70

Source result set:
	Table Scan ResultSet for CHANGES a <snip>

Is this similar to what you see?  The reason I ask is because there are some 
other issues (namely, DERBY-1905) where we are seeing *really* bad cost 
estimates--far more ridiculous than the one shown above.  So I just want to make 
sure we're on the same page w.r.t to what "poor" cost estimates means in the 
context of this issue.  In the long run maybe that's an irrelevant question, but 
I would at least like to make sure that I'm seeing the same estimates that you are.

> The latter would require replacing the Optimizable against which the IN
> predicate is being evaluated with a substitute that represents a nested
> join (probably).  

Are we replacing the Optimizable or the IN predicate? In your comments for 
DERBY-2152 you wrote that your proposed approach would transform a query like:

  SELECT * FROM tableA WHERE columnB IN (constC, ?, ?, constD)

into something like:

  SELECT * FROM tableA WHERE columnB IN
   (SELECT vti.column1 FROM new ArgsToRowsVTI(SomeArgs))

So in this case I think we're replacing the IN predicate itself, not the 
"Optimizable against which the IN predicate is being evaluated" (i.e. tableA)? 
Is that correct? Sorry if I'm being dense here.

In any event, "switching" between these two transformations as part of a call to 
getNextDecoratedPermutation() seems like it could be tough.  In the optimizer 
work that I've done there has always been the implicit assumption that the query 
being optimized is "constant" throughout optimization; i.e. that we're always 
optimizing the *same* set of result set nodes--all that changes is the join 
order, index choices, and join strategies.

That doesn't mean it's impossible to try to optimize different "rewrites" of the 
query--it just means that I haven't tried it so I don't know what it would 
entail.  My gut feeling, though, is that you'd be better off trying to do the 
query transformation as part of "preprocessing" as you mentioned above.

> I really don't have enough of a grasp of the sub-query optimization to
> know whether that is a good idea or not.

For what little it's worth, I can't help but wonder if it would be better to 
rewrite the query as:

    (SELECT vti.column1 FROM new ArgsToRowsVTI(SomeArgs)) x
  WHERE tableA.columnB = x.column1

This seems like something the optimizer would be better able to optimize.  But 
that said, I did a simple example query using a real table instead of the VTI 
and the optimizer seems to have treated the query similarly for both rewrites. 
So go with your proposed rewrite and see what happens :)

> The next question is how do I pass information to the VTI implementation
> regarding which parameters of the activation should it use as the values
> in the virtual table?

The InListOperatorNode should have a list of values to which it applies.  If you 
pass that list into the VTI constructor then I think you'll get the relevant 
values and parameters that apply to the IN-list...?

> I'm currently looking to add the following if/else branch to InListOperatorNode:
> 		else if ((leftOperand instanceof ColumnReference) &&
> 				 rightOperandList.containsAllConstantOrParameterNodes())
> 		{


I noticed that there's already the following "else if":

		else if ((leftOperand instanceof ColumnReference) &&

Is it true that anytime the existing "else-if" condition is true, your new 
"else-if" condition will be true, as well?  I.e a rightOperandList that contains 
"all constant nodes" would also always contain "all constant or parameter 
nodes", wouldn't it?  If so, then where would you add the new "else-if"?  If you 
add it before the existing one then the existing one will never get executed; if 
you add it after the existing one then your new VTI solution will only go into 
effect if the IN-list has at least one parameter.  Is that what you are planning?

I realize this email has more questions than answers, so my apologies for lack 
of helpful information.  Unfortunately, the particular area of the optimizer 
with which you are working (preprocessing and/or rewrite) is one that is still 
relatively unexplored for me...


View raw message