db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@sbcglobal.net>
Subject Re: [OPTIMIZER] OptimizerImpl "best plans" for subqueries?
Date Sat, 18 Feb 2006 18:10:39 GMT
Jeffrey Lichtman wrote:
> I think it may be necessary when remembering "truly the best" access 
> path at any level of subquery to tell each table subquery to remember 
> its best access path as "truly the best" also, and to recurse through 
> all the nested subqueries to do this. I don't think this would be very 
> hard to implement. 

I was actually toying around with something like this yesterday, so it's nice to 
hear you suggest a similar approach :)  I wrote up a solution that tries to 
implement it; I ran derbyall last night with the changes and there were no 
failures.  I also ran my predicate pushing code locally with these changes 
applied (in place of the Phase 1 patch) and everything worked there, too.  So I 
think this is the right way to go--and it feels more robust to me than the first 
Phase 1 patch for DERBY-805.  So thanks for the ideas and reassurance...

> I only worry about what would happen with deeply-nested subqueries - this 
> could create an order 2**n algorithm (that is, one whose time would grow 
> exponentially with the number of levels of nesting).

I guess my answer to that is if we're dealing with deeply nested queries, the 
caller will probably be preparing the query once and then executing it multiple 
times, in which case the extra optimization cost would be an up-front, one time 
thing while the benefits from doing the extra work could lead to huge 
performance savings for every execution of the query thereafter (when combined 
with predicate pushdown, that is).  Depending on just how much overhead we're 
adding to the optimization phase, this seems like an acceptable trade-off to me...

The solution I've coded tries to do what you describe.  I'll outline it here and 
post the patch as a new "Phase 1" patch for DERBY-805 since as I said earlier, I 
think this issue is really the issue that Phase 1 is trying to address.

First, I added a new field called "optimizerToBestPlanMap" in FromTable, which 
is the parent class to all Optimizables.  This field (which is a HashMap) holds 
"truly the best access path" (TTBP) for the Optimizable with respect to every 
OptimizerImpl at or above it, where "at" refers to the OptimizerImpl to which 
the Optimizable directly belongs.

I then added a new method called "addOrLoadBestPlanMapping" to the Optimizable 
interface, implemented in FromTable.  The signature for this method is:

+	public void addOrLoadBestPlanMapping(boolean doAdd,
+		Optimizer optimizer) throws StandardException
+	{

If "doAdd" is true then we will take the Optimizable's trulyTheBestAccessPath 
field and put a copy of it into optimizerToBestPlanMap, with the key being the 
received optimizer.  If "doAdd" is false, then we will search for the hash map 
for a TTBP that corresponds to the received optimizer, and if we find one we 
will load that path information into the Optimizable's trulyTheBestAccessPath field.

This new method is over-written by the two Optimizable classes that can have 
children: namely, SingleChildResultSetNode and TableOperatorNode.  In each case 
the method will first call super.addOrLoadBestPlanMapping(), which adds/loads 
TTBP for the node itself, and then it will recursively call that method on the 
child result in two cases: 1) if the child is another Optimizable, then the call 
is made directly on that child; 2) if the child is a SelectNode, then we call a 
new method on SelectNode that will iterate through all Optimizables in its FROM 
list and recursively call addOrLoadBestPlanMapping().  By doing so we "recurse 
through all nested subqueries", as Jeff said. For example, in 
SingleChildResultSetNode we have the following:

+	/** @see Optimizable#addOrLoadBestPlanMapping */
+	public void addOrLoadBestPlanMapping(boolean doAdd,
+		Optimizer optimizer) throws StandardException
+	{
+		super.addOrLoadBestPlanMapping(doAdd, optimizer);
+		if (childResult instanceof Optimizable)
+		{
+			((Optimizable)childResult).
+				addOrLoadBestPlanMapping(doAdd, optimizer);
+		}
+		else if (childResult instanceof SelectNode)
+		{
+			((SelectNode)childResult).
+				addOrLoadBestPlanMappings(doAdd, optimizer);
+		}
+	}

The third thing I've done is change the "rememberAsBest()" method on the 
Optimizable interface to take an instance of Optimizer as an argument.  Then 
whenever an OptimizerImpl decides it has found a best overall path and calls 
rememberAsBest(), we make an additional call (within rememberAsBest()) to the 
new addOrLoadBestPlanMapping() method with "doAdd" set to true and passing in 
the optimizer.  This means that whenever an OptimizerImpl finds a best plan, 
every Optimizable in its list (including all Optimizables found recursively 
through subqueries) will remember what its TTBP was with respect to that 

And as a last step, I've added two calls to addOrLoadBestPlanMapping() to the 
OptimizerImpl class itself.  The first is made with "doAdd" set to true and is 
called whenever we place an Optimizable at a join position, just before we call 
"startOptimizing()".  We need this call in order to remember what TTBP for each 
Optimizable (including those within subqueries) is _before_ we begin optimizing, 
so that if the join order for the current OptimizerImpl ends up being worse than 
a previous join order, we can "revert" the Optimizable (and recursively any 
Optimizables from nested subqueries) back to its TTBP for the OptimizerImpl's 
previously-determined best join order.

The second call to addOrLoadBestPlanMapping() comes when we pull an Optimizable 
from its current join position, in which case doAdd is "false", which we means 
we take the Optimizable's TTBP corresponding to the current OptimizerImpl and 
load it into the Optimizable's trulyTheBestAccessPath field.  If the 
OptimizerImpl's most recent join order was the best one so far, TTBP will end up 
holding the plan that was saved for that join order; otherwise, TTBP will end up 
holding the "reverted" plan, which is the plan the Optimizable had for whatever 
join order was previously considered best.

Okay, so maybe that'not as clear as I'd like it to be.  However, when I post 
these changes to DERBY-805 (as Phase 1 "v2"), I will also update the DERBY-805 
document to include this description and a walk-through example showing how it's 
all supposed to work.  In the meantime, if anyone understands this description 
well enough to point out any glaring problems, that'd be great.  Otherwise, I'll 
work on cleaning up the patch and adding an example to DERBY-805.html, and will 
post both later today...

Many thanks to all who continue to follow these optimizer threads,

View raw message