db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "PredicatePushdown" by Army
Date Thu, 09 Nov 2006 23:39:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by Army:
http://wiki.apache.org/db-derby/PredicatePushdown

New page:
As part of the [http://wiki.apache.org/db-derby/LanguageOptimize optimization] phase for a
query, Derby looks at all "optimizable" result sets and tries to determine what the best "access
path" for each of those result sets is. 

One of the key factors in determining the best access path for an Optimizable--and for a query
overall--is whether or not there are predicates available for that Optimizable to use. Since
predicates are the means by which indexes become useful, the more we can push them down into
base tables, the more likely we are to find useful indexes. The use of indexes can in turn
reduce the number of rows we have to read from disk, thus leading to considerable performance
gains in many cases.

For example, take the query:

{{{
select ppl_info.id, ppl_info.fullname
  from happy_ppl_ids, ppl_info
  where happy_ppl_ids.id = ppl_info.id
}}}

There are two possible [http://wiki.apache.org/db-derby/JoinOrderPermutations "complete join
orders"] for this query:

  * [{{{happy_ppl_ids}}}, {{{ppl_info}}}]
  * [{{{ppl_info}}}, {{{happy_ppl_ids}}}]

In the first join order, {{{ppl_info}}} is called the "inner" table for the join; in the second
join order, the inner table would be {{{happy_ppl_ids}}}.

If Derby did not attempt to "push" the WHERE predicate down to the "inner" table for these
join orders, then execution of the query would result in a full cross join between the two
tables before the predicate could be applied. Thus every row in both tables would be read
from disk, even though most of the rows might then be discarded because they do not satisfy
the predicate. So if, as an example, {{{happy_ppl_ids}}} has a hundred (100) rows in it and
{{{ppl_info}}} has a thousand (1000) rows in it, and if the {{{id}}} field for each table
is unique, then the result of the query would return at most one hundred rows. But without
predicate pushdown, we would end up reading all rows in both tables from disk, only to then
discard at least 900 (90 percent) of the rows that we read from {{{ppl_info}}}. For large
tables with many and/or large columns, this extra I/O overhead can lead to major performance
problems on even simple queries.

If, on the other hand, the Derby optimizer chose to use {{{ppl_info}}} as the inner table
and then "pushed" the WHERE predicate down to that table, we could then apply the predicate
at the "store layer". This means that instead of reading all rows from disk and then applying
the predicate, we actually use the predicate to determine what rows we need to read from disk.
This way we only read rows from disk that satisfy the predicate. So in the example above,
if there was an index defined on the {{{id}}} column of {{{ppl_info}}}, we would only have
to read at most 100 rows from that table (instead of the full thousand).

Thus when optimizing a query that has predicates, Derby attempts to push those predicates
down to the Optimizables in the query when possible. That said, there are two different "phases"
of optimization during which the predicates have to be pushed: the first is the "cost-based
optimization" phase, the second is the "modification of access paths" phase. High-level descriptions
of these two phases can be found [http://wiki.apache.org/db-derby/LanguageOptimize here].

== Pushing Predicates: Cost-Based Optimization ==

The determination of which predicates can be pushed to which Optimizables is dependent on
the "join order permutation" in question. It is not surprising, then, that the point of entry
for predicate pushdown in the code is in the [http://wiki.apache.org/db-derby/JoinOrderPermutations
getNextPermutation()] method of Derby's !OptimizerImpl class.

Every time a call to {{{OptimizerImpl.getNextPermutation()}}} is made, that method figures
out which predicates in the WHERE clause can be evaluated against the most-recently-placed
Optimizable in the join order. If there is at least one such predicate, then the getNextPermutation()
method makes the call to push the predicate(s) down to the Optimizable for cost evaluation.
In the code the call to push the predicates is simply:

{{{
pushPredicates(
    optimizableList.getOptimizable(nextOptimizable),
    assignedTableMap);
}}}

where {{{nextOptimizable}}} indicates which Optimizable was most recently placed in the join
order.

=== The pushPredicates() method ===

The primary task of {{{OptimizerImpl.pushPredicates()}}} is to figure out which predicates
in the !OptimizerImpl's list of predicates can be pushed to the received Optimizable. In order
to do so, the method relies on the predicate's [http://wiki.apache.org/db-derby/ReferencedTableMaps
referenced table map], which effectively indicates what Optimizables are referenced by the
predicate.

==== Pushdown Criteria ====

With the "referenced tabe map" information in hand, we begin by saying that a predicate {{{pred}}}
can only be pushed to a target Optimizable {{{op_target}}} at join order position {{{jpos}}}
if for '''every''' Optimizable {{{op_ref}}} that {{{pred}}} references, the following is true:

  * Pushdown Condition #1: {{{op_ref}}} has an assigned position in the current join order
(at or before {{{jpos}}})

This condition is often sufficient for determining when to push a predicate. As an example,
let's say we have a three-way join of tables T1, T2, and T3 in a query such as this:

{{{
select * from t1, t2, t3
  where t1.c1 = t2.c2
  and t2.c3 = t3.c1
}}}

Using the Derby algorithm for finding a [http://wiki.apache.org/db-derby/JoinOrderPermutations
"next permutation"], it will take three calls to getNextPermutation() before we reach our
first complete join order. Each call will attempt to push the query's WHERE predicates down
to the most-recently-placed Optimizable in the join order based on the above criteria. The
results will be as follows:

''1st call to getNextPermutation()''
{{{
  join order = [ 0 -1 -1 ]
  jpos = 0
  op_target = optimizable "0" in the list = T1
  let pred = "t1.c1 = t2.c2"
    then pred references T1 and T2
      let op_ref = T1
        T1 has an assigned position of "0" which is <= jpos,
          so we've met condition 1.
      let op_ref = T2
        T2 does not have an assigned position in the join order.
    Therefore we do NOT push pred to op_target.
  let pred = "t2.c3 = t3.c1"
    then pred references T2 and T3
      let op_ref = T2
        T2 does not have an assigned position in the join order.
    Therefore we do NOT push pred to op_target.
}}}

Since we did not push any predicates, !OptimizerImpl's pred list still has both predicates
in it: { {{{(t1.c1 = t2.c2)}}}, {{{(t2.c3 = t3.c1)}}} }.

''2nd call to getNextPermutation()''
{{{
  join order = [ 0  1 -1 ]
  jpos = 1
  op_target = optimizable "1" in the list = T2
  let pred = "t1.c1 = t2.c2"
    then pred references T1 and T2
      let op_ref = T1
        T1 has an assigned position of "0" which is <= jpos,
          so we've met condition 1.
      let op_ref = T2
        T2 has an assigned position of "1" which is <= jpos,
          so we've met condition 1.
    Therefore we DO push pred to op_target (i.e. to T2).
  let pred = "t2.c3 = t3.c1"
    then pred references T2 and T3
      let op_ref = T2
        T2 has an assigned position of "1" which is <= jpos,
          so we've met condition 1.
      let op_ref = T3
        T3 does not have an assigned position in the join order.
    Therefore we do NOT push pred to op_target.
}}}

Since we pushed {{{(t1.c1 = t2.c2)}}} down to T2, that predicate will then be removed from
!OptimizerImpl's pred list, leaving: { {{{(t2.c3 = t3.c1)}}} }

''3rd call to getNextPermutation()''
{{{
  join order = [ 0  1  2 ]
  jpos = 2
  op_target = optimizable "2" in the list = T3
  let pred = "t2.c3 = t3.c1"
    then pred references T2 and T3
      let op_ref = T2
        T2 has an assigned position of "1" which is <= jpos,
          so we've met condition 1.
      let op_ref = T3
        T3 has an assigned position of "2" which is <= jpos,
          so we've met condition 1.
    Therefore we DO push pred to op_target (i.e. to T3).
}}}

And since we pushed the predicate {{{(t2.c3 = t3.c1)}}} down to T3, that predicate will be
removed from !OptimizerImpl's pred list, which means it is now empty.

The single condition mentioned above is sufficient for most predicates that occur in Derby,
as we have just seen. However, that condition does not cover all cases. In situations where
the predicate in question will be joining some Optimizable in !OptimizerImpl with another
Optimizable from elsewhere in the query tree, Condition #1 is insufficient. For example we
might have 
the following query:

{{{
select * from t3 left outer join t2 on t2.j = t3.k
}}}

whose query tree that looks like this:

{{{
      SelectNode0
        (PRN1)
          |
   HalfOuterJoinNode
        /   \
      PRN2  PRN3
       |     |
       T3    T2
}}}

In this case we will create a separate !OptimizerImpl for T2 and another one for T3 (see {{{TableOperatorNode.optimizeSource()}}},
which is called from {{{JoinNode.optimizeIt()}}}). Let's call these new !OptimizerImpls {{{OI_T2}}}
and {{{OI_T3}}}, respectively. Notice that we have an equality predicate between T3 and T2
and we want to push that predicate to the "inner" table--i.e. to T2. When it comes time to
call {{{OI_T2.getNextPermutation()}}}, we will end up at {{{OI_T2.pushPredicates()}}}, which
is where we hope to push the predicate. However, if the first "pushdown" condition shown above
is our only condition, we will '''not''' push the predicate:

''1st call to getNextPermutation()'' (on {{{OI_T2}}})
{{{
  join order = [ 0 ]
  jpos = 0
  op_target = optimizable "0" in the list = T2
  let pred = "t2.j = t3.k"
    then pred references T2 and T3
      let op_ref = T2
        T2 has an assigned position of "0" which is <= jpos,
          so we've met condition 1.
      let op_ref = T3
        T3 does not have an assigned position in the join order.
    Therefore we do NOT push pred to op_target.
}}}

In this case the predicate references T3, but T3 will never be placed in the join order for
{{{OI_T2}}} because that !OptimizerImpl only has one Optimizable in its list--namely, T2.
So in order to allow the pushing of the predicate in this case, we add a second "pushdown"
condition as follows:

A predicate {{{pred}}} can only be pushed to a target Optimizable {{{op_target}}} at join
order position {{{jpos}}} if for '''every''' Optimizable {{{op_ref}}} that {{{pred}}} references,
'''one of''' the following is true:

  * Pushdown Condition #1: {{{op_ref}}} has an assigned position in the current join order
(at or before {{{jpos}}})
  * Pushdown Condition #2: {{{op_ref}}} is '''not''' included in !OptimizerImpl's list of
Optimizables

Now when we try to push the predicate to T2 in the above example, we will successfully do
so:

''1st call to getNextPermutation()''
{{{
  join order = [ 0 ]
  jpos = 0
  op_target = optimizable "0" in the list = T2
  let pred = "t2.j = t3.k"
    then pred references T2 and T3
      let op_ref = T2
        T2 has an assigned position of "0" which is <= jpos,
          so we've met condition 1.
      let op_ref = T3
        T3 is not included in OI_T2's list of Optimizables,
          so we've met condition 2.
    Therefore we DO push pred to op_target (i.e. to T2).
}}}

Unfortunately, the addition of this second condition creates problems in the relatively rare
cases where we have "scoped" predicates. A scoped predicate is a predicate that has already
been pushed from some outer query down to the !OptimizerImpl for which we are trying out permutations.
Such a predicate's "referenced table map" may not be in sync with its actual column references.
Or put another way, the predicate's referenced map may not actually represent the Optimizables
that are ultimately referenced by the predicate. For example, assume we have the following
query:

{{{
select * from
  T1,
  (select t2.p, t3.a from t2, t3
    UNION select j as p, k as a from t4) X1
where t1.i = X1.a
}}}

The query tree for this query would look something like the following, where "PRN" stands
for !ProjectRestrictNode:

{{{
      SelectNode0
     (PRN0, PRN1)
       |     |
       T1 UnionNode
           /   |
         PRN2  PRN3
          |     |
  SelectNode1   SelectNode2
   (PRN4, PRN5)    (PRN6)
     |     |         |
     T2    T3        T4
}}}

We can see from this query tree that "X1.a" points to the Union node and ultimately maps to
the base table T3 in {{{SelectNode1}}} and to the base table T4 in {{{SelectNode2}}}. So as
part of [http://wiki.apache.org/db-derby/PredPushingWithUnions UnionNode processing] the predicate
will be "scoped" to PRN2 and PRN3. The newly-scoped predicates will then get passed to the
!OptimizerImpls for {{{SelectNode1}}} and {{{SelectNode2}}}, respectively.

Let's now assume that we are in the "pushPredicates()" method of the !OptimizerImpl for {{{SelectNode1}}}.
The first call to getNextPermutation() will result in the scoped predicate being pushed as
follows:

''1st call to getNextPermutation()'' (on {{{SelectNode1}}}'s !OptimizerImpl)
{{{
  join order = [ 0 -1 ]
  jpos = 0
  op_target = optimizable "0" in the list = PRN4
  let (scoped) pred = "t1.1 = PRN2.a"
    then pred references T1 and PRN2
      let op_ref = T1
        T1 is not included in !OptimizerImpl's list of
          Optimizables, so we've met condition 2.
      let op_ref = PRN2
        PRN2 is not included in !OptimizerImpl's list
          of Optimizables, so we've met condition 2.
    Therefore we DO push pred to op_target (i.e. to PRN4).
}}}

But that is WRONG. We said at the start that the predicate ultimately maps to T3, so we should
NOT be pushing it to PRN4 (T2). Instead we should be pushing it to PRN5 (T3). For that reason
we update our pushdown criteria as follows to account for scoped predicates:

A predicate {{{pred}}} can only be pushed to a target Optimizable {{{op_target}}} at join
order position {{{jpos}}} if the following are BOTH true:

  1. For '''every''' Optimizable {{{op_ref}}} that {{{pred}}} references, '''one of''' the
following is true:
    * Pushdown Condition #1: {{{op_ref}}} has an assigned position in the current join order
(at or before {{{jpos}}})
    * Pushdown Condition #2: {{{op_ref}}} is not included in !OptimizerImpl's list of Optimizables
  2. For '''at least one''' {{{op_ref}}} that {{{pred}}} references, the following is true:
    * Pushdown Condition #3: {{{pred}}} references some '''base''' table in the subtree rooted
at {{{op_ref}}} and that same base table is also in the subtree rooted at {{{op_target}}}.
 

So now we revisit our UNION example with this new pushdown criteria. Remember that in this
example we said the '''base''' table that is referenced by the scoped predicate is T3.

''1st call to getNextPermutation()''
{{{
  join order = [ 0 -1 ]
  jpos = 0
  op_target = optimizable "0" in the list = PRN4
  let (scoped) pred = "t1.1 = PRN2.a"
    then pred references T1 and PRN2
      let op_ref = T1
        T1 is not included in !OptimizerImpl's list of
          Optimizables, so we've met condition 2.
      let op_ref = PRN2
        PRN2 is not included in !OptimizerImpl's list of
          Optimizables, so we've met condition 2.
      pred references base tables T1 and T3, but neither of
        those base tables appear in the subtree rooted at PRN4.
    Therefore we do NOT push pred to op_target (i.e. to PRN4).
}}}

''2nd call to getNextPermutation()''
{{{
  join order = [ 0 1 ]
  jpos = 1
  op_target = optimizable "1" in the list = PRN5
  let (scoped) pred = "t1.1 = PRN2.a"
    then pred references T1 and PRN2
      let op_ref = T1
        T1 is not included in !OptimizerImpl's list of
          Optimizables, so we've met condition 2.
      let op_ref = PRN2
        PRN2 is not included in !OptimizerImpl's list of
          Optimizables, so we've met condition 2.
      pred references base tables T1 and T3, and table T3
        DOES appear in the subtree rooted at PRN5.
    Therefore we DO push pred to op_target (i.e. to PRN5).
}}}

So now we have correctly pushed the scoped predicate to the target Optimizable in {{{SelectNode1}}}'s
!OptimizerImpl's list of Optimizables.

I leave it as an exercise for the reader to verify that the updated pushdown criteria still
works correctly for the preceding examples in this section.

For more on scoped predicates, see PredPushingWithUnions.

===== A Subtlety =====

Thus far we have been talking about pushing a predicate as part of the "cost-based optimization"
phase of optimization. We mentioned above that the starting point for pushing a predicate
is in the "getNextPermutation()" method. For each join order permutation returned by that
method, some predicates can be pushed to the most-recently-placed Optimizable and others cannot.
The logic for determining where to push the predicate is found in {{{OptimizerImpl.pushPredicates()}}}.

Having said all of that, there is a potentially hidden fact about predicate pushdown that
we have not yet explicitly stated. Namely: a predicate that was pushed to some Optimizable
{{{OPT_1}}} in one permutation may be pushed to a '''different''' Optimizable {{{OPT_2}}}
in another permutation. This follows from the pushdown criteria mentioned above.

As an example, let's again look at the query shown at the very top of this page:

{{{
select ppl_info.id, ppl_info.fullname
  from happy_ppl_ids, ppl_info
  where happy_ppl_ids.id = ppl_info.id
}}}

Here we have one predicate and two Optimizables. There are a total of four different join
order permutations for this query, two of which are complete join orders. As we see below,
the first complete permutation will push the predicate to one Optimizable ({{{ppl_info}}}),
while the second complete permutation will push the predicate to the other Optimizable ({{{happy_ppl_ids}}}).

''1st call to getNextPermutation()''
{{{
  join order = [ 0 -1 ]
  jpos = 0
  op_target = optimizable "0" in the list = happy_ppl_ids
  let pred = "happy_ppl_ids.id = ppl_info.id"
    then pred references happy_ppl_ids and ppl_info
      let op_ref = happy_ppl_ids
        happy_ppl_ids has an assigned position of "0" which
          is <= jpos, so we've met condition 1.
      let op_ref = ppl_info
        ppl_info does not have an assigned position in the
          join order.
        ppl_info *is* included in the OptimizerImpl's list
          of Optimizables
    Therefore we do NOT push pred to op_target.
}}}

Since we did not push any predicates, !OptimizerImpl's pred list still has the single predicate
in it: { {{{(happy_ppl_ids.id = ppl_info.id)}}} }.

''2nd call to getNextPermutation()''
{{{
  join order = [ 0 1 ]
  jpos = 1
  op_target = optimizable "1" in the list = ppl_info
  let pred = "happy_ppl_ids.id = ppl_info.id"
    then pred references happy_ppl_ids and ppl_info
      let op_ref = happy_ppl_ids
        happy_ppl_ids has an assigned position of "0" which
          is <= jpos, so we've met condition 1.
      let op_ref = ppl_info
        ppl_info has an assigned position of "1" which is
          <= jpos, so we've met condition 1.
      pred references base table happy_ppl_ids and ppl_info,
        and ppl_info DOES appear appear in the subtree rooted
        at ppl_info.
    Therefore we DO push pred to op_target (i.e. to ppl_info).
}}}

So here we have our first complete join order in which {{{ppl_info}}} is the inner table.
We thus pushed the predicate down to {{{ppl_info}}}. Now let's see what happens for the second
complete join order...

''3rd call to getNextPermutation()''

'''NOTE:''' As part of this call we "pull" the predicate from {{{ppl_info}}} and put it back
into !OptimizerImpl's predicate list. See "Pulling a Predicate" below for more.

{{{
  join order = [ 1 -1 ]
  jpos = 0
  op_target = optimizable "1" in the list = ppl_info
  let pred = "happy_ppl_ids.id = ppl_info.id"
    then pred references happy_ppl_ids and ppl_info
      let op_ref = happy_ppl_ids
        happy_ppl_ids does not have an assigned position in
          the join order.
        happy_ppl_ids *is* included in the OptimizerImpl's
          list of Optimizables
    Therefore we do NOT push pred to op_target.
}}}

Since we did not push any predicates, !OptimizerImpl's pred list still has the single predicate
in it: { {{{(happy_ppl_ids.id = ppl_info.id)}}} }.

''4th call to getNextPermutation()''
{{{
  join order = [ 1 0 ]
  jpos = 1
  op_target = optimizable "0" in the list = happy_ppl_ids
  let pred = "happy_ppl_ids.id = ppl_info.id"
    then pred references happy_ppl_ids and ppl_info
      let op_ref = happy_ppl_ids
        happy_ppl_ids has an assigned position of "1" which
          is <= jpos, so we've met condition 1.
      let op_ref = ppl_info
        ppl_info has an assigned position of "0" which is
          <= jpos, so we've met condition 1.
      pred references base table happy_ppl_ids and ppl_info,
        and happy_ppl_ids DOES appear appear in the subtree
        rooted at happy_ppl_ids.
    Therefore we DO push pred to op_target (i.e. to happy_ppl_ids).
}}}

We can see that for this second complete join order the predicate ends up being pushed to
{{{happy_ppl_ids}}}. So for the two different join orders we pushed the predicate to two different
Optimizables. This is perfectly fine since, for a given join order, the Optimizable to which
the predicate is pushed will calculate its cost using the predicate, while the other Optimizable
will estimate a cost that does not take the predicate into account. It is the job of the !OptimizerImpl
to then figure out which join order yields the [http://wiki.apache.org/db-derby/DecoratedPermutations
best overall cost].

==== Pushing a Predicate ====

After having determined that a predicate can be pushed to the most-recently-placed Optimizable
in the join order (which is called {{{curTable}}} below), the pushPredicates() method does
the actual pushing with the following lines of code:

{{{
/*
** Finally, push the predicate down to the Optimizable at the
** end of the current proposed join order, if it can be evaluated
** there.
*/
if (pushPredNow)
{
    /* Push the predicate and remove it from the list */
    if (curTable.pushOptPredicate(pred))
    {
        predicateList.removeOptPredicate(predCtr);
    }
}
}}}

As with the vast majority of [http://wiki.apache.org/db-derby/LanguageOptimize optimizer processing],
the core of the work for each Optimizable is done by the Optimizable itself. In this particular
case the Optimizable gets a predicate from !OptimizerImpl and determines if a) the predicate
can be enforced by the Optimizable itself, or b) the predicate can be pushed further down
the subtree whose root is the Optimizable. If either of these is possible then the Optimizable's
"pushOptPredicate()" method will return true. Otherwise it will return false.

Generally speaking, the Optimizables which are capable of enforcing or pushing predicates
further down the tree are subclasses to either !SingleChildResultSetNode, which is result
set node that has a child result set, or !TableOperatorNode, which is a result set node that
has *two* children result sets.

For example, if we search the Derby codeline for classes that define a "pushOptPredicate()"
method, we will see the following:

  * !SingleChildResultSetNodes:
    * !DistinctNode
    * !GroupByNode
    * !ProjectRestrictNode

  * !TableOperatorNodes:
    * !HalfOuterJoinNode
    * !JoinNode
    * !SetOperatorNode

  * Others:
    * !FromBaseTable
    * !FromTable
    * !FromVTI

Note that !FromTable, which is an "ancestor" class to all Optimizables, will always return
"false". This means that the default behavior is to ''not'' push a predicate down. If a specific
type of result set is capable of enforcing the predicate or of pushing the predicate down
the tree, then it must define its own "pushOptPredicate()" method (either directly or indirectly
through inheritance).

For example, !FromBaseTable defines its own pushOptPredicate() because it (!FromBaseTable)
is able to enforce the predicate on its own by passing the predicate as a qualifier to the
store layer at execution time. Thus the {{{FromBaseTable.pushOptPredicate()}}} method simply
stores the predicate locally and returns "true".

The other result set of interest here is the !ProjectRestrictNode. This node is particularly
important because for any SELECT query that Derby optimizes, we guarantee that a !ProjectRestrictNode
will exist above all Optimizables in the SELECT query's FROM list. This guarantee is enforced
as part of the "preprocessing" phase of optimization, which is described in more detail [http://wiki.apache.org/db-derby/LanguageOptimize
here].

Thus for most queries, when we call {{{curTable.pushOptPredicate()}}} in the "pushPredicates()"
method of !OptimizerImpl, we will end up at the pushOptPredicate() method of !ProjectRestrictNode.
The !ProjectRestrictNode (hereafter called "PRN") then stores the predicate locally in its
{{{restrictionList}}} field. Later, when the {{{OptimizerImpl.costPermutation()}}} method
makes a call to {{{optimizable.optimizeIt()}}} as described [http://wiki.apache.org/db-derby/LanguageOptimize
here], we will again end up back in !ProjectRestrictNode--but this time we will be in the
optimizeIt() method. The PRN.optimizeIt() method then takes the locally-stored predicate and
tries to pass it down to the PRN's child result set, where the child can then use it for estimating
its (the child's) cost. If the child result set does not use the predicate, then the PRN will
"default" to enforcing the predicate itself.

All of that said, we should mention here that it ''is'' possible to have an !OptimizerImpl
containing one or more Optimizables that are not instances of !ProjectRestrictNode. For the
case of an !OptimizerImpl that is created for a SELECT query, we will always have PRNs as
mentioned above. But !OptimizerImpls can be created for other parts of a query tree, as well.
For example, any class that extends !TableOperatorNode will, as part of a call to "optimizeIt()",
create an !OptimizerImpl that contains the left child and another that contains the right
child. In these cases the child is ''not'' guaranteed to be a !ProjectRestrictNode. It could
just as easily be some other instance of Optimizable. What this means is that any Optimizable
that is capable of enforcing a predicate or of pushing the predicate further down the tree
must implement the "pushOptPredicate()" method. It is not correct to assume that a !ProjectRestrictNode
will always exist above all Optimizables in an !Optim
 izerImpl's list.

=== Pulling a Predicate ===

Hopefully we now have a general understanding of what it means to "push" a predicate to a
target Optimizable during cost-based optimization. There is, however, one final area of discussion
that we have not yet covered.

If we look again at the section entitled "A Subtlety" above, we see a case where a predicate
is pushed to two different target Optimizables depending on which join order permutation is
being evaluated. The scenario is explained in detail above, but there is one piece missing:
when we pushed the predicate to {{{ppl_info}}} as part of the 2nd call to getNextPermutation(),
we removed it from !OptimizerImpl's predicate list (see the code snippet in the preceding
section). So how were we able to push that '''same''' predicate to {{{happy_ppl_ids}}} as
part of the 4th call to getNextPermutation(), as well?

The answer is pretty intuitive: between the 2nd and 4th calls to getNextPermutation(), we
"pulled" the predicate back up into the !OptimizerImpl's list of predicates. As described
on the [http://wiki.apache.org/db-derby/PredicatePushdown getNextPermutation()] page, whenever
we remove an Optimizable from the join order we also make sure that we "pull" any predicates
that were pushed to that Optimizable back up. More concretely, when we made the 3rd call to
getNextPermutation() we removed both {{{ppl_info}}} and {{{happy_ppl_ids}}} from the join
order, and then we placed {{{ppl_info}}} at the first position for the "next" permutation.
Since we had already pushed the predicate to {{{ppl_info}}} as part of the 2nd call, we pulled
it back up when we removed {{{ppl_info}}} from the join order. Thus the predicate was again
sitting in !OptimizerImpl's list of predicates and therefore we were able to push it to {{{happy_ppl_ids}}}
as part of the 4th call to getNextPermutation().

To find the code that initiates the "pull" process, we look again at the getNextPermutation()
method:

{{{
/*
** Pull the predicates at from the optimizable and put
** them back in the predicate list.
*/
pullMe.pullOptPredicates(predicateList);
}}}

In this case "pullMe" is the Optimizable that we are removing from the join order. Not surprisingly,
the Optimizable is (yet again) responsible for doing the work. Here we pass down the !OptimizerImpl's
list and ask the Optimizable to "put back" any predicates that it received from earlier calls
to pushOptPredicate().

If we again look at the nodes "of interest", we will see that the pullOptPredicates() method
of !FromTable does nothing, as we would expect. This is because the pushOptPredicate() method
of that class does not do anything, either, so there is nothing to pull. In !FromBaseTable
the pullOptPredicates() method simply iterates through all locally-stored predicates and puts
them back into the received list. !ProjectRestrictNode does likewise. So given the ways in
which each of these classes behave in their pushOptPredicates(), their behavior in pullOptPredicates()
seems to make sense.

With this final piece in place we have now covered the general flow of code for pushing a
predicate during "cost-based optimization", which is the second of [http://wiki.apache.org/db-derby/LanguageOptimize
three phases] making up Derby's optimization process. The next section briefly discusses how
a predicate is pushed once and for all during the third and final phase of Derby optimization.

== Pushing Predicates: Modification of Access Paths ==

The final phase of Derby optimization is called "modifying access paths", which is the phase
in which we walk the tree of nodes and perform any work required to prepare each node for
code generation. This may include adding nodes to, or otherwise reshaping, parts of the tree.
In terms of pushing predicates, this traversal is very similar to the traversal we do during
optimization. This time, though, we want to leave the predicates in their pushed positions
so that, when it comes time to generate the code for the query, the predicates are where they
need to be. 

The code relevant to this particular discussion is exactly where we would expect to find it--namely,
in the "modifyAccessPaths()" method of !OptimizerImpl. That method first reorganizes the Optimizables
so that their order in the !OptimizerImpl's list matches the order that is given in the !OptimizerImpl's
{{{bestJoinOrder}}} array. Then the method iterates through the list and makes a call to the
'''same''' "pushPredicates()" method that we described in detail above. And finally, it makes
a call to the "modifyAccessPaths()" method of each Optimizable in the list. The code is simply:

{{{
/* Modify the access path of each table, as necessary */
for (int ictr = 0; ictr < numOptimizables; ictr++)
{
    Optimizable optimizable = optimizableList.getOptimizable(ictr);

    /* Current table is treated as an outer table */
    outerTables.or(optimizable.getReferencedTableMap());

    /*
    ** Push any appropriate predicates from this optimizer's list
    ** to the optimizable, as appropriate.
    */
    pushPredicates(optimizable, outerTables);

    optimizableList.setOptimizable(
        ictr,
        optimizable.modifyAccessPath(outerTables));
}
}}}

The thing to note here is that we are using the same method to push the predicates during
"modification of access paths" that we used when pushing them during "cost-based optimization".
This is important because we want to ensure that when we modify access paths, we push the
predicates to the Optimizables in the exact same way we did for the "best join order" saved
by !OptimizerImpl. Since the determination of which predicates can be pushed to which Optimizables
is made in the pushPredicates() method, and since that decision is dependent on join order,
we know that if we place the Optimizables in "best join order" and call pushPredicates(),
the predicates will go to the same target Optimizables that they went to when cost-based optimization
first found "bestJoinOrder".

At that point it is then up to each Optimizable to ensure that its use of the predicates for
"modification of access paths" parallels its use of those same predicates during cost-based
optimization. As a quick example we can take a look at !ProjectRestrictNode. In the "optimizeIt()"
method of that class we can see that the PRN's {{{restrictionList}}}, which holds the predicates
pushed to that PRN from pushOptPredicate(), is passed to the PRN's child during optimization:

{{{
if (childResult instanceof Optimizable)
{
    childCost = ((Optimizable) childResult).optimizeIt(
                                        optimizer,
                                        restrictionList,
                                        outerCost,
                                        rowOrdering);
    ...
}
else ...
{
    ...
    childResult = childResult.optimize(optimizer.getDataDictionary(), 
                                       restrictionList,
                                       outerCost.rowCount());
    ...
}
}}}

If we then look at PRN.modifyAccessPaths(), we see that similar calls are made in that method,
as well. Most notably:

{{{
/* When we optimized the child we passed in our restriction list
 * so that scoped predicates could be pushed further down the
 * tree. We need to do the same when modifying the access
 * paths to ensure we generate the same plans the optimizer
 * chose.
 */
childResult = childResult.modifyAccessPaths(restrictionList);
}}}

The expectation is that all other instances of Optimizable show a similar consistency between
the handling of predicates during "optimizeIt()" verses "modifyAccessPaths()" processing.
As long as this expectation is met, we will be able to generate a query plan that pushes predicates
where the optimizer thinks they will be most beneficial, and we can therefore reap the performance
benefits of "predicate pushdown" as a general optimization strategy.

Mime
View raw message