db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "OptimizerTableNumbers" by BryanPendleton
Date Wed, 27 Sep 2006 19:46:33 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by BryanPendleton:
http://wiki.apache.org/db-derby/OptimizerTableNumbers

The comment on the change is:
Capture notes from DERBY-1866

New page:
In the query optimizer code, you will find that the code often manipulates information about
tables by using "table numbers". Table numbers are integers which stand in as surrogates for
particular tables during the optimization process.

 * How are Table Numbers assigned? How do views and synonyms affect this process? 

Note that at this point in the processing, constructs like views and synonyms already been
transformed and replaced by their underlying "real" tables. Transformations and table resolution
occur during the "binding" and "preprocessing" stages of query compilation--and both of those
stages occur before optimization begins.  So at this point a view will be represented by a
ProjectRestrictNode whose child is a SelectNode, and a synonym will be represented by whatever
FromTable it (the synonym) is actually referring to.

Table numbers are also assigned during binding/preprocessing, so by the time we get to the
code involved with costing and plan selection, all FromTables (aka  "Optimizables") in the
entire query will have an assigned table number (if required--in some cases it's not necessary
and thus will be -1).  Additionally any column reference which points to one of those FromTables
will have the table number for that FromTable stored locally (namely, in ColumnReference.tableNumber).

Note that when a ColumnReference is "remapped" to point to a different FromTable, its local
information--including tableNumber--is updated accordingly.  Note also that a "FromTable"
is not restricted to base tables--anything that can be specified in the FROM list of a SELECT
query will be represented by some instance of FromTable, whether it be a subquery, a base
table, a union node, etc.  Every FromTable has its own "table number", with the exception
of ProjectRestrictNodes.  For a PRN, if the PRN's child is itself a FromTable (as opposed
to, say, a SelectNode) then the PRN's table number will be -1 and any attempts to "get" the
PRN's table number will return the table number of the PRN's child.  If the PRN's child is
not a FromTable, then the PRN will have it's own table number.

 * If the optimizer is choosing to access an index for a table, rather than accessing the
table itself, does the table number change depending on whether it is an index or a base table
which is being processed by the ProjectRestrictNode?

The thing to note here is that "table number" is strictly a language-created, compilation
time value to allow binding, preprocessing, optimization, and code generation to distinguish
between the various FromTables in the original query.  A table number is not stored on disk
and it is independent of the access path decisions (including whether or not an index is used)
made by the optimizer.  Furthermore, there is no link between a given table number and the
actual on-disk table that it points to.  Table number 0 could be for T1 in one query, T2 in
another query, and T100 in a third query.

As a simple (but admittedly meaningless) example, take the following query:

  {{{select t1.i, x1.j from t1, t1 x1 where t1.i = x1.j;}}}

At bind time Derby will assign every item in the FROM list a table number.  So in this case,
"T1" gets table number 0 and "T1 X1" gets table number 1.  The fact that both FromTables are
really pointing to the same base table doesn't matter.  For the duration of compilation/optimization,
they are represented by two different instances of FromTable and are considered two different
"tables", each having its own table number.  (For the record, in this particular example the
different FromTables will in fact point to the same underlying tableDescriptor field).

Given that, the predicate "t1.i = x1.j" will have a left ColumnReference pointing to a FromBaseTable
representing T1 with table number "0" and a right ColumnReference pointing to a different
FromBaseTable representing X1 (i.e. T1 again) with table number "1".

If the optimizer then decides to use an index for T1, the table number doesn't change--the
optimizer just decides that for "the FromBaseTable whose table number is 0 we will use an
index".  In fact, once assigned, the table number for a specific FromTable remains the same
for the duration of the compilation of the statement.

Mime
View raw message