Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Dbderby Wiki" for change notification.
The following page has been changed by Army:
http://wiki.apache.org/dbderby/JoinOrderPermutations
New page:
As described on the [http://wiki.apache.org/dbderby/LanguageOptimize Optimizer Overview]
page, there are three primary methods in Derby's !OptimizerImpl class that encapsulate the
general process of trying to find the "best join order" for a given !OptimizerImpl's list
of Optimizables. Those three methods are:
 getNextPermutation()
 getNextDecoratedPermutation()
 costPermutation()
This page describes the general flow of code as it occurs when a class such as !SelectNode
or !TableOperatorNode calls the getNextPermutation() method of an !OptimizerImpl.
== In Short ==
The call to "getNextPermutation()" tells an !OptimizerImpl to place an Optimizable at the
next available position in the current join order array. This method contains all of the logic
for adding an Optimizable to the join order, for removing an Optimizable from the join order,
and for selecting which Optimizable is next in line for placement/removal.
As part of this method the !OptimizerImpl also figures out which predicates (if any) in its
list of predicates can be evaluated at the current position in the join order. It then [http://wiki.apache.org/dbderby/PredicatePushdown
pushes] those predicates down to the Optimizable that it just placed, where they can in turn
be used for estimating the cost of that Optimizable at its current position.
== In the Code: Finding the next "permutation" ==
The first time a call to "getNextPermutation()" is made, the !OptimizerImpl has a list with
{{{numOptimizables}}} Optimizables in it, a (potentiallyempty) list of predicates, and an
integer array whose values have all been initialized to 1. The array of integers represents
the current "join order" and its size is {{{numOptimizables}}}, as well.
For an index {{{p}}} such that {{{0 <= p < numOptimizables}}}, let {{{v}}} be the value
at position {{{p}}} in the integer array, where {{{0 <= v < numOptimizables}}}. Then
we know the following:
* If {{{(v == 1)}}} then the {{{(p+1)}}}'th position in the current join order is "empty",
meaning that no Optimizable has been assigned to that position.
* else the {{{(v+1)}}}'th Optimizable in the list has been assigned to the {{{(p+1)}}}th
position in the join order.
In the actual code for !OptimizerImpl, the array of integers is called {{{proposedJoinOrder}}}.
So if, as an example, we have an !OptimizerImpl with four Optimizables in its list and {{{proposedJoinOrder}}}
has the values [2, 0, 3, 1], then we know that:
* the 3rd Optimizable in the list is at the 1st position in the join order
* the 1st Optimizable in the list is at the 2nd position in the join order
* the 4th Optimizable in the list is at the 3rd position in the join order
* the 2nd Optimizable in the list has not yet been assigned a position in the join order
* the 4th position in the join order is unassigned
The term "permutation" is used to describe the different permutations of values that can exist
within {{{proposedJoinOrder}}}. Thus all of the following are permutations of {{{proposedJoinOrder}}}
for the above example:
{{{
[1 1 1 1]
[ 0 1 1 1]
[ 0 1 2 1]
[ 0 1 2 3]
[ 0 1 3 2]
etc...
}}}
The call to "getNextPermutation()" tells the !OptimizerImpl to find another permutation of
{{{proposedJoinOrder}}}, with the general assumption that the "next" permutation is one which
has not yet been found. If all possible permutations have been exhausted, then the call to
"getNextPermutation()" returns false.
=== The Algorithm ===
The general algorithm used by an !OptimizerImpl to iterate through the various permutations
of a join order is shown below.
''NOTE'': This is the algorithm in its simplest form. There is additional code in {{{OptimizerImpl.getNextPermutation()}}}
that can change the flow of the algorithm, but for the sake of simplicity we do not address
that here (see "Permutation Jumping" below for more).
Let {{{p}}} represent the index at which we want to place the next Optimizable in {{{proposedJoinOrder}}}.
Let {{{v}}} represent the value at {{{proposedJoinOrder[p]}}}. Then the algorithm for a single
call to getNextPermutation() is as follows:
1. if {{{p < numOptimizables  1}}} then let {{{p = p + 1}}}
1. Let {{{v = proposedJoinOrder[p] + 1}}}
1. if {{{v == numOptimizables}}} then skip to step 5.
1. if there exists any {{{i}}} such that the following two conditions are true, then let
{{{v = v + 1}}} and go back to step 3:
* {{{0 <= i < p}}} AND
* {{{proposedJoinOrder[i] == v}}}
1. if {{{proposedJoinOrder[p] >= 0}}} then let {{{proposedJoinOrder[p] = 1}}} (++)
1. if {{{v < numOptimizables}}} then let {{{proposedJoinOrder[p] = v}}} and return true
(DONE). (++)
1. Let {{{p = p  1}}}
1. if {{{p < 0}}} then we've exhaused all possible permutations, so return false (DONE).
1. Go back to step 2
Note: '(++)' => Additional work is required to "place" or "remove" an Optimizable. See
the "Placing and Removing an Optimizable" section below.
As a simple example, lets assume we have an !OptimizerImpl with a list of two Optimizables.
Execution of the algorithm would be as follows. Note that before the first call to {{{getNextPermutation()}}},
{{{p}}} will be 1 and {{{proposedJoinOrder}}} will be [1 1]. For each call to {{{getNextPermutation()}}}
we only show the steps that actually change the state; i.e. if a step is conditional and the
condition is not met, then we do not show anything for that step.
First call to {{{getNextPermutation()}}}:
{{{
1: p < (numOptimizables  1), so let p = p + 1 = 1 + 1 = 0
2: v = proposedJoinOrder[0] + 1 = 1 + 1 = 0;
6: v < numOptimizables so proposedJoinOrder[0] = v = 0;
return true.
}}}
=> {{{p}}} is now 0, {{{proposedJoinOrder}}} is now [ 0 1 ].
2nd call to {{{getNextPermutation()}}}:
{{{
1: p < (numOptimizables  1), so let p = p + 1 = 0 + 1 = 1
2: v = proposedJoinOrder[1] + 1 = 1 + 1 = 0;
4: if (i = 0) then (i < p) AND (proposedJoinOrder[i] == v),
so let v = v + 1 = 0 + 1 = 1
6: v < numOptimizables so proposedJoinOrder[1] = v = 1;
return true.
}}}
=> {{{p}}} is now 1, {{{proposedJoinOrder}}} is now [ 0 1 ].
3rd call to {{{getNextPermutation()}}}:
{{{
2: v = proposedJoinOrder[p] + 1 = 1 + 1 = 2;
3: v == numOptimizables so skip to step 5
5: proposedJoinOrder[1] >= 0, so let proposedJoinOrder[0] = 1
7: p = p  1 = 1  1 = 0
9: Go back to step 2
2: v = proposedJoinOrder[0] + 1 = 0 + 1 = 1;
5: proposedJoinOrder[0] >= 0, so let proposedJoinOrder[0] = 1
6: v < numOptimizables so proposedJoinOrder[0] = v = 1
}}}
=> {{{p}}} is now 0, {{{proposedJoinOrder}}} is now [ 1 1 ].
4th call to {{{getNextPermutation()}}}:
{{{
1: p < (numOptimizables  1), so let p = p + 1 = 0 + 1 = 1
2: v = proposedJoinOrder[1] + 1 = 1 + 1 = 0;
6: v < numOptimizables so proposedJoinOrder[1] = v = 0;
return true.
}}}
=> {{{p}}} is now 1, {{{proposedJoinOrder}}} is now [ 1 0 ].
5th call to {{{getNextPermutation()}}}:
{{{
2: v = proposedJoinOrder[1] + 1 = 0 + 1 = 1
4: if (i = 0) then (i < p) AND (proposedJoinOrder[i] == v),
so let v = v + 1 = 1 + 1 = 2
3: v == numOptimizables so skip to step 5
5: proposedJoinOrder[1] >= 0, so let proposedJoinOrder[1] = 1
7: p = p  1 = 1  1 = 0
9: Go back to step 2
2: v = proposedJoinOrder[0] + 1 = 1 + 1 = 2
3: v == numOptimizables so skip to step 5
5: proposedJoinOrder[0] >= 0, so let proposedJoinOrder[0] = 1
7: p = p  1 = 0  1 = 1
8: p < 0 so return false.
}}}
=> {{{p}}} is now 1, {{{proposedJoinOrder}}} is now [ 1 1 ].
And since the 5th call returned "false", we know that we found all of the different permutations
for {{{proposedJoinOrder}}} in this example.
=== Algorithm Implementation ===
If we extract out the relevant lines from {{{OptimizerImpl.getNextPermutation()}}}, steps
2 thru 9 of the above algorithm are implemented with the following code. Note that in this
code, {{{joinPosition}}} maps to {{{p}}} from the algorithm and {{{nextOptimizable}}} maps
to {{{v}}}.
{{{
/*
** The join position becomes < 0 when all the permutations have been
** looked at.
*/
while (joinPosition >= 0)
{
int nextOptimizable = 0;
...
/* Find the next unused table at this join position */
nextOptimizable = proposedJoinOrder[joinPosition] + 1;
for ( ; nextOptimizable < numOptimizables; nextOptimizable++)
{
boolean found = false;
for (int posn = 0; posn < joinPosition; posn++)
{
/*
** Is this optimizable already somewhere
** in the join order?
*/
if (proposedJoinOrder[posn] == nextOptimizable)
{
found = true;
break;
}
}
...
}
/*
** We are going to try an optimizable at the current join order
** position. Is there one already at that position?
*/
if (proposedJoinOrder[joinPosition] >= 0)
{
...
/* Mark current join position as unused */
proposedJoinOrder[joinPosition] = 1;
}
/* Have we exhausted all the optimizables at this join position? */
if (nextOptimizable >= numOptimizables)
{
...
/*
** We have exhausted all the optimizables at this level.
** Go back up one level.
*/
/* Go back up one join position */
joinPosition;
...
continue;
}
/*
** We have found another optimizable to try at this join position.
*/
proposedJoinOrder[joinPosition] = nextOptimizable;
...
return true;
}
...
return false;
}}}
NOTE: Ellipses ("...") represent code that is not shown because it does not directly relate
to the algorithm in question.
I leave it as an exercise for the reader to walk through this code and to verify that it yields
the same results as shown in the "Algorithm" section above.
=== Permutation "Jumping" ===
As illustrated by the presence of so many ellipses ("...") in the code fragment from the preceding
section, there is a lot of code in getNextPermutation() that does not directly relate to the
simple algorithm for finding permutations. Most of the missing code deals with placing Optimizables
at specific positions in the permutation (see "Placing and Removing Optimizables" below).
The rest of the missing code deals with situations in which getNextPermutation() takes one
of the following alternate codepaths:
1. Skips over permutations that !OptimizerImpl knows will lead to suboptimal join orders
(this is known as "costbased search space pruning").
1. Jumps to a specific permutation that it thinks will lead to the best overall join order.
1. Quits looking for new permutations because [http://wiki.apache.org/dbderby/OptimizerTimeout
timeout] occurs.
As an example of the first scenario ("costbased pruning"), assume we have an !OptimizerImpl
with four Optimizables in its list. Assume also that our best join order so far is [ 1 0 2
3] with some estimated cost EC, and that the next call to "getNextPermutation()" returns the
permutation [ 1 2 1 1 ]. If the cost of this *partial* join order is already greater than
EC, then there is logic in getNextPermutation() to effectively skip all other permutations
that begin with [ 1 2 ] since we know they will be more expensive than the best join order
so far. So in this case all of the following permutations would be skipped:
{{{
[1 2 0 1]
[1 2 0 3]
[1 2 3 1]
[1 2 3 0]
}}}
By skipping the costing phase for each of these permutations we can save time in the optimization
process, which allows the !OptimizerImpl to try out more potentiallybetter permutations before
timeout occurs.
In the second scenario, which we refer to as the "Jumping" scenario, a call to getNextPermutation()
"jumps" to a specific target permutation that is deemed "more likely" to lead to the best
overall join order. There are two different situations where this can happen.
'''Jump Case 1:'''
The first "Jump" situation occurs when the total number of Optimizables in the '''entire'''
query (meaning the combined sum of the number of Optimizables in all !OptimizerImpls) is greater
than six (6). In this case getNextPermutation() will operate as normal until the first complete
join has been found and an associated cost estimate has been determined. At that point getNextPermutation()
then orders the list of Optimizables based on the estimated row count for each Optimizable,
in order of least to greatest. That ordering becomes the "target" join order to which we want
to "jump" because the code guesses that such a join order is more likely to provide the lowest
overall cost estimate. This means that subsequent calls to getNextPermutation() will return
permutations which build up to the target join order.
Let's again take an example where the !OptimizerImpl has four Optimizables in its list. getNextPermutation()
runs as normal until it finds a complete join order, so we'll see the following permutations
for the first four calls to getNextPermutation():
{{{
1st call: [ 0 1 1 1]
2nd call: [ 0 1 1 1]
3rd call: [ 0 1 2 1]
4th call: [ 0 1 2 3]
}}}
At this point we have a complete join order with some estimated cost. Now let's say that the
estimated row count for Optimizable "0" is 28, for Optimizable "1" is 48, for Optimizable
"2" is 8, and for Optimizable "3" is 38. The ordering of the Optimizables based on estimated
row count would then be [ 2 0 3 1 ], so that becomes the "target" join order to which we want
to jump.
Now that we have a target join order, the next four calls to getNextPermutation() will return
permutations that lead up to the target join order. In this case they will be:
{{{
5th call: [ 2 1 1 1]
6th call: [ 2 0 1 1]
7th call: [ 2 0 3 1]
8th call: [ 2 0 3 1]
}}}
As we can see from this example, the use of "jumping" allowed us to get to the target join
order in just 8 calls to getNextPermutation(). If we had not done the jumping, we would have
had to make *many* more calls to getNextPermutation() before we arrived at [ 2 0 3 1]. Depending
on what each of the Optimizables represents, it is possible that we would have encountered
optimizer timeout before reaching the target join order, and therefore may have missed out
on what the code guessed would be the best join order for the !OptimizerImpl. By "jumping"
directly to the target join order, we minimize the chance of timing out before having found
the estimate for that join order.
Note that once we have successfully jumped to the target join order, processing of getNextPermutation()
continues as normal. Thus the next several calls would return:
{{{
9th call: [ 2 1 1 1]
10th call: [ 2 1 0 1]
11th call: [ 2 1 0 3]
11th call: [ 2 1 3 1]
12th call: [ 2 1 3 0]
etc..
}}}
Eventually getNextPermutation() will return the "highest" permutation, [ 3 2 1 0]. For "normal"
processing that would be the last permutation and subsequent calls to getNextPermutation()
would return "false". However, since we are jumping, we do not stop there. Instead, getNextPermutation()
will go back to the beginning of the permutations to try out the ones over which we "jumped".
Note that we will only do do this if we have not seen optimizer timeout yet.
The actual code that does all of this jumprelated work is pretty well commented, so we do
not go into details here. Please see {{{OptimizerImpl.getNextPermutation()}}} for the details.
'''Jump Case 2:'''
The second "Jump" situation can occur when one !OptimizerImpl, call it OI_1, has a list of
Optimizables that includes a nonflattened subquery which is itself represented by a nested
!OptimizerImpl, OI_2. In that case we may end up doing several "rounds" of optimization for
OI_2. Loosely speaking, whenever a call to getNextPermutation() on some !OptimizerImpl returns
"false", we consider that to be the end of a "round" of optimization. If later another call
to getNextPermutation() is made on that same !OptimizerImpl, that marks the beginning of another
round.
In cases where an !OptimizerImpl undergoes several rounds of optimization, we want to avoid
trying out the exact same permutations with the exact same Optimizables if we know that the
permutations will lead to suboptimal join orders. The code in getNextPermutation() accomplishes
this by (conditionally) "jumping" to the best join order that was found in the previous round
of optimization. So if we return yet again to our example of an !OptimizerImpl with four Optimizables,
let's assume that this time the !OptimizerImpl is nested and will undergo two rounds of optimization.
Let's further say that, upon completion of the first round, the best join order for the nested
!OptimizerImpl was found to be [ 2 1 3 0].
Now when we begin the second round of optimization, there are certain conditions under which
we can say with confidence that the best join order from the first round is likely to be the
best join order for the second round, as well. If those conditions are met, getNextPermutation()
will take the best join order from the first roundwhich in our example is [2 1 3 0]and
will make that the "target" join order. Then it will do the "jump" processing outlined above
so that it can get to the that join order as quickly as possible. In this particular example
we will get to the target join order after only four calls to getNextPermutation(). We then
estimate the cost of that join order and continue processing from there, as described for
the first "jump" situation outlined above.
Again, the actual code that determines if and when this second "jump" scenario can occur is
well commented in !OptimizerImpl.java, so interested readers should take a look at the relevant
code in {{{OptimizerImpl.getNextPermutation()}}}.
The final way in which the permutation algorithm can be "altered" is in the event of optimizer
"timeout". For more on when that happens and what the resultant behavior is, see OptimizerTimeout.
== In the Code: Placing and Removing Optimizables ==
In the brief description of getNextPermutation() given at the top of this page, we mentioned
that the method is responsible for placing Optimizables in, and removing them from, the current
join order. Finding the permutations as described above is the first part of that taskbut
that is not all we do. As illustrated by the presence of so many ellipses ("...") in the code
fragment in "The Implementation" section, there is a lot of code in getNextPermutation() that
does not directly relate to the task of finding join order permutations. Instead, most of
the missing code deals with the placement/removal of an Optimizable at/from a specific position
in the join order.
=== Placing an Optimizable ===
Whenever a call to getNextPermutation() successfully finds another permutation, the method
must also "place" an Optimizable at the mostrecentlyset position in the permutation *before*
returning "true". Placement of an Optimizable occurs as part of Step 6 in the "Algorithm"
section above and involves the following four things:
1. Clearing out the "best access path" for the Optimizable. If set, the "best access path"
for the Optimizable is lingering from a previous permutation and thus needs to be cleared
out in preparation for evaluation of the current permutation.
1. Adding the table numbers that are referenced by the Optimizable to the !OptimizerImpl's
{{{assignedTableMap}}}, which is used for determing the legality of different permutations.
Here we use the term "legal" to mean that if any Optimizable in the list has dependencies
on other Optimizables in the list, the position of the Optimizables in the current permutation
satisfies those dependencies.
1. Telling the Optimizable to "start optimizing", which allows the Optimizable to take any
steps required to prepare for finding "decorations" and for estimating the cost of those decorations.
See [http://wiki.apache.org/dbderby/LanguageOptimize Optimizer Overview] for a definition
of what a "decoration" is and how it is used.
1. Searching the !OptimizerImpl's list of predicates and pushing the ones that apply to
Optimizable down to that Optimizable for cost evaluation. For more on the process of pushing
predicates, see PredicatePushdown.
The actual code for doing these things is pretty straightforward and is shown here (code pulled
directly from {{{OptimizerImpl.getNextPermutation()}}}):
{{{
/* Reinit (clear out) the cost for the best access path
* when placing a table.
*/
optimizableList.getOptimizable(nextOptimizable).
getBestAccessPath().setCostEstimate((CostEstimate) null);
/* Set the assigned table map to be exactly the tables
* in the current join order.
*/
assignedTableMap.clearAll();
for (int index = 0; index <= joinPosition; index++)
{
assignedTableMap.or(optimizableList.getOptimizable(
proposedJoinOrder[index]).getReferencedTableMap());
}
if (optimizerTrace)
{
trace(CONSIDERING_JOIN_ORDER, 0, 0, 0.0, null);
}
Optimizable nextOpt =
optimizableList.getOptimizable(nextOptimizable);
nextOpt.startOptimizing(this, currentRowOrdering);
pushPredicates(
optimizableList.getOptimizable(nextOptimizable),
assignedTableMap);
return true;
}}}
As can be seen by the last line of this block of code, once the Optimizable has been placed
at the mostrecentlyset position in the permutation, getNextPermutation() returns "true".
Hence the correlation with Step 6 in the algorithm section above.
To take an example, let's say that we have an !OptimizerImpl with four Optimizables in its
list and that the next permutation is [ 1 2 3 1]. The "mostrecentlyset" position is just
the last position for which the value is nonnegative. So in this case it is position "2"
in the array (array indexes start at 0), which has a value of "3". What we have to do, then,
is place Optimizable "3" at position "2" in the join order, which simply amounts to executing
the above code fragment with {{{nextOptimizable}}} set to "3". Note: Optimizable "3" is really
the 4th Optimizable in the list because Optimizable numbering starts at 0.
=== Removing an Optimizable ===
If we look back at the algorithm for finding the "next permuation" of a join order, the point
at which we must "remove" an Optimizable from the join order is step 5:
if {{{proposedJoinOrder[p] >= 0}}} then let {{{proposedJoinOrder[p] = 1}}}
This step is effectively "unassigning" position {{{p}}} in the join order. But it is not
enough to just set the integer array entry back to 1; we also have to do the following:
1. Substract the estimated cost for the Optimizable from the running total cost of the current
join order. As part of the general code flow for optimization, we add the estimated cost for
an Optimizable to a running total and store that total within !OptimizerImpl itself. Thus
when we remove the Optimizable from the join order, we have to subtract that Optimizable's
estimated cost from the running total. Note that !OptimizerImpl actually keeps track of two
different costs: one for the "normal" cost estimate and one for the "sort avoidance" cost.
So when we remove an Optimizable from the join order we have to make sure to update *both*
of these costs accordingly.
1. "Pull" up any predicates that we pushed down to the Optimizable when we placed it. See
PredicatePushdown for more on how pulling of predicates works.
1. If the mostrecentlyevaluated permutation was *not* the best complete join order so
far, then when we remove the Optimizable from the join order we have to reset its best access
path to whatever it was when the best join order was found ('''if''' we have found a best
join order by now).
To continue our example from the "Placing an Optimizable" section, let's say that we choose
permutation [ 1 2 3 1], we place Optimizable "3" at position "2", we estimate the cost of
Optimizable "3" at that position (via a call to {{{OptimizerImpl.getNextDecoratedPermutation()}}}
and {{{OptimizerImpl.costPermutation()}}}; see [http://wiki.apache.org/dbderby/LanguageOptimize
here]), and we add the cost to the running total in !OptimizerImpl. We then make another call
to getNextPermutation(), we choose permutation [ 1 2 3 0], we place Optimizable "0" at position
"3", we estimate the cost of Optimizable "0" at that position, and we add the cost to the
running total. At this point we have a complete join order with some cost EC.
If we follow our algorithm for finding the next permutation, the next call to getNextPermutation()
is going to result in [ 1 3 1 1], which will require the removal of three Optimizables from
the join ordernamely, "0", "3", and "2". So we will do the first two removal steps for each
Optimizable. We will only do the third step, though, if prior to [ 1 2 3 0] we had already
found a different complete join order that was cheaper. Say, for example, that join order
[ 0 1 3 2] had a lower cost than [ 1 2 3 0]. Then every time we remove an Optimizable from
the join order we will "reload" that Optimizable's best access path to whatever it was for
join order [ 0 1 3 2].
To understand why this plan "reload" occurs we go back to the permutation algorithm given
in the "Algorithm" section above. One potentiallyhidden fact that follows from the algorithm
is that when we reach the point where we have exhausted all permutations (i.e. Step 8) we
will have removed all Optimizables from the join order, leaving {{{p == 1}}} and {{{proposedJoinOrder
== [ 1 1 1 ... ]}}}. This is the point at which we return "false". That said, the expectation
is that when getNextPermutation() returns false, every Optimizable in the list will hold its
best access path for whatever join order the !OptimizerImpl chose as its "best". This expectation
comes from the description of the "CostBased Optimization" phase described [http://wiki.apache.org/dbderby/LanguageOptimize
here].
So given that we remove all Optimizables from the join order before returning false, we want
to ensure that when we remove an Optimizable, the Optimizable knows what its best access path
is for the !OptimizerImpl's best join order. If we do that, then when getNextPermutation()
finally returns false we will already have the required best access paths for the Optimizables
loaded.
As mentioned above, once getNextPermutation() returns "false", we have concluded the current
"round" of optimization for the !OptimizerImpl in question. And so concludes this discussion
of the {{{OptimizerImpl.getNextPermutation()}}} method.
