db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@sbcglobal.net>
Subject Re: [OPTIMIZER] OptimizerImpl "best plans" for subqueries?
Date Sat, 18 Feb 2006 01:10:29 GMT
> I think you've found a bug, but the solution you're trying may not be 
> correct. I believe the optimizer currently keeps track of only two "best 
> plans" - the best access path for each table in a query (or subquery) as 
> it's currently being considered in a join order, and the best overall 
> join order and path for each table for the best plan it's found so far. 

Am I right in thinking that the first of these two "best plans" is stored in 
"bestAccessPath", and the second is stored in "trulyTheBestAccessPath"?  That's 
how I read it, but I'd just like to make sure.

> As for the problems you're seeing in view.sql and refActions1.sql, I
> suspect it's due to the fact that in some queries there are certain join
> orders that won't work (e.g. if you pass a column from one table to the
> constructor for a VTI, the VTI must come after the other table in the
> join order). Your attempted fix is probably clobbering the original best
> plan for the subquery in a case where no subsequent best plan is
> possible.

Okay, this definitely makes sense.  I went back and looked at my quick-fix and I 
see that I was a bit overzealous--I reset not only the bestCost for the 
OptimizerImpl, but also the "foundABestPlan" flag and the "bestJoinOrder" field. 
  The fact that I cleared the foundABestPlan flag was what caused the error I 
mentioned.

That said, I then changed the code so that it only resets the bestCost field (by 
setting it to the max value possible); I left foundABestPlan and bestJoinOrder 
untouched.  When I did this, views.sql and refActions1.sql both passed.  At 
first I thought this may have just been dumb luck, but the more I think about 
it, the more I think this is the correct thing to do.  Since, as you said, it's 
possible that "there is no subsequent best plan" for the subquery, we need to 
keep hold of the fact that we did, at some point, find a best plan, and we need 
to know what the corresponding join order is.  That said, resetting bestCost to 
the max value does two things:

  1. It ensures that if at least one best plan is found for the current subquery 
with respect to the current join order of the outer query, that best plan will 
be saved as the best plan for the subquery and it's cost will be returned 
accordingly.

  2. It ensures that if no best plan is found, then the cost for the subquery 
will be outrageously high (Double.MAX_VALUE) which means the outer query will 
not choose its current join order--which is good, because the subquery can't be 
executed with that outer join order.  Further, we will still know that there was 
at some point some outer join order for which the subquery had a valid best plan 
(because foundABestPlan will still be set) and we will still know the subquery's 
join order for that best plan (because bestJoinOrder will still be set).

Number 1 would, I believe, fix the issue that started this thread, because the 
"1 million" plan would be saved (1 million is less than Double.MAX_VALUE) and 
then rejected as too expensive.  Number 2 would, I believe, address your comment 
that there may in fact be "no subsequent best plan", which means the subquery 
must know that it did have a valid plan at one point, and it must know what the 
join order for that plan was.  The only information we lose is the cost of that 
earlier plan--but the cost should already have been viewed and handled by the 
outer query at the time it was first returned, so I don't think it's required 
any more...?  The fact that views.sql and refActions1.sql pass with this change 
suggests that this is along the right lines; I would of course like to have 
someone confirm if I'm seeing this correctly...

Does that sound right, or am I being too optimistic?

> For subqueries it may need to keep track of another "best plan" - the 
> plan to use with the best plan found for the outer query.
> 
> I'll have to think about whether it gets more complicated than this if 
> subqueries are nested. Would the optimizer have to keep track of even 
> more levels of "best plan" in cases like this (I hope not).

I've been struggling to wrap my head around this question, as well (at least, I 
*think* this is the question with which I've been struggling ;)  And ultimately, 
I think this ties back to what I was trying to do with the Phase 1 patch for 
DERBY-805--namely, keep track of "the plan to use with the best plan found for 
the outer query".  But based on this discussion, I wonder if the patch I posted 
goes far enough--if there are subqueries beneath subqueries, such as can happen 
if we have unions beneath unions beneath unions, how do we get each of those 
subqueries to remember its best plan with respect to _every_ level of outer 
query (there could be several) that sits above it?

That's the question I've been pondering for the past two days; is that also what 
you are referring to by the above "I'll have to think about" statement?  Or am I 
taking this in a different direction?

In any event, since it appears that this is in fact an issue with the optimizer, 
I'll file a Jira entry for tracking purposes.

Thanks for the reply, Jeff.  This was very helpful...
Army


Mime
View raw message