db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Trivial Update of "OptimizerTableNumbers" by Army
Date Fri, 10 Nov 2006 22:55:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by Army:
http://wiki.apache.org/db-derby/OptimizerTableNumbers

------------------------------------------------------------------------------
- In the query optimizer code, you will find that the code often manipulates information about
tables by using "table numbers". Table numbers are integers which stand in as surrogates for
particular tables during the optimization process.
+ In the Derby "cost-based optimization" code you will find that the code often manipulates
information about tables by using "table numbers". Table numbers are integers which stand
in as surrogates for particular tables during the optimization process.
  
   * How are Table Numbers assigned? How do views and synonyms affect this process? 
  
- Note that at this point in the processing, constructs like views and synonyms already been
transformed and replaced by their underlying "real" tables. Transformations and table resolution
occur during the "binding" and "preprocessing" stages of query compilation--and both of those
stages occur before optimization begins.  So at this point a view will be represented by a
ProjectRestrictNode whose child is a SelectNode, and a synonym will be represented by whatever
FromTable it (the synonym) is actually referring to.
+ Note that by the time we start optimization, constructs like views and synonyms have already
been transformed and replaced by their underlying "real" tables. Transformations and table
resolution occur during the [http://wiki.apache.org/db-derby/LanguageBinding "binding"] and
[http://wiki.apache.org/db-derby/LanguageOptimizer "preprocessing"] stages of query compilation--and
both of those stages occur before cost-based optimization begins.  So at this point a view
will be represented by a !ProjectRestrictNode whose child is a !SelectNode, and a synonym
will be represented by whatever !FromTable it (the synonym) is actually referring to.
  
- Table numbers are also assigned during binding/preprocessing, so by the time we get to the
code involved with costing and plan selection, all FromTables (aka  "Optimizables") in the
entire query will have an assigned table number (if required--in some cases it's not necessary
and thus will be -1).  Additionally any column reference which points to one of those FromTables
will have the table number for that FromTable stored locally (namely, in ColumnReference.tableNumber).
+ Table numbers are also assigned during binding, so by the time we get to the code involved
with costing and plan selection, all !FromTables (aka  "Optimizables") in the entire query
will have an assigned table number (if required--in some cases it's not necessary and thus
will be -1).  Additionally any column reference which points to one of those !FromTables will
have the table number for that !FromTable stored locally (namely, in {{{ColumnReference.tableNumber}}}).
  
- Note that when a ColumnReference is "remapped" to point to a different FromTable, its local
information--including tableNumber--is updated accordingly.  Note also that a "FromTable"
is not restricted to base tables--anything that can be specified in the FROM list of a SELECT
query will be represented by some instance of FromTable, whether it be a subquery, a base
table, a union node, etc.  Every FromTable has its own "table number", with the exception
of ProjectRestrictNodes.  For a PRN, if the PRN's child is itself a FromTable (as opposed
to, say, a SelectNode) then the PRN's table number will be -1 and any attempts to "get" the
PRN's table number will return the table number of the PRN's child.  If the PRN's child is
not a FromTable, then the PRN will have it's own table number.
+ Note that when a !ColumnReference is "remapped" to point to a different !FromTable, its
local information--including tableNumber--is updated accordingly.  Note also that a "!FromTable"
is not required to be a base table--rather, anything that can be specified in the FROM list
of a SELECT query will be represented by some instance of !FromTable, whether it be a subquery,
a base table, a union node, etc.  Every !FromTable has its own "table number", with the exception
of !ProjectRestrictNodes.  For a PRN, if the PRN's child is itself a !FromTable (as opposed
to, say, a !SelectNode) then the PRN's table number will be -1 and any attempts to "get" the
PRN's table number will return the table number of the PRN's child.  If the PRN's child is
not a !FromTable, then the PRN will have it's own table number.
  
-  * If the optimizer is choosing to access an index for a table, rather than accessing the
table itself, does the table number change depending on whether it is an index or a base table
which is being processed by the ProjectRestrictNode?
+  * If the optimizer is choosing to access an index for a table, rather than accessing the
table itself, does the table number change depending on whether it is an index or a base table
which is being processed by the !ProjectRestrictNode?
  
- The thing to note here is that "table number" is strictly a language-created, compilation
time value to allow binding, preprocessing, optimization, and code generation to distinguish
between the various FromTables in the original query.  A table number is not stored on disk
and it is independent of the access path decisions (including whether or not an index is used)
made by the optimizer.  Furthermore, there is no link between a given table number and the
actual on-disk table that it points to.  Table number 0 could be for T1 in one query, T2 in
another query, and T100 in a third query.
+ The thing to note here is that "table number" is strictly a language-created, compilation
time value to allow binding, preprocessing, optimization, and code generation to distinguish
between the various !FromTables in the original query.  A table number is not stored on disk
and it is independent of the access path decisions (including whether or not an index is used)
made by the optimizer.  Furthermore, there is no link between a given table number and the
actual on-disk table that it points to.  Table number 0 could be for T1 in one query, T2 in
another query, and T100 in a third query.
  
  As a simple (but admittedly meaningless) example, take the following query:
  
    {{{select t1.i, x1.j from t1, t1 x1 where t1.i = x1.j;}}}
  
- At bind time Derby will assign every item in the FROM list a table number.  So in this case,
"T1" gets table number 0 and "T1 X1" gets table number 1.  The fact that both FromTables are
really pointing to the same base table doesn't matter.  For the duration of compilation/optimization,
they are represented by two different instances of FromTable and are considered two different
"tables", each having its own table number.  (For the record, in this particular example the
different FromTables will in fact point to the same underlying tableDescriptor field).
+ At bind time Derby will assign every item in the FROM list a table number.  So in this case,
"T1" gets table number 0 and "T1 X1" gets table number 1.  The fact that both !FromTables
are really pointing to the same base table doesn't matter.  For the duration of compilation/optimization,
they are represented by two different instances of !FromTable and are considered two different
"tables", each having its own table number.  (For the record, in this particular example the
different !FromTables will in fact point to the same underlying {{{tableDescriptor}}} field).
  
- Given that, the predicate "t1.i = x1.j" will have a left ColumnReference pointing to a FromBaseTable
representing T1 with table number "0" and a right ColumnReference pointing to a different
FromBaseTable representing X1 (i.e. T1 again) with table number "1".
+ Given that, the predicate {{{t1.i = x1.j}}} will have a left !ColumnReference pointing to
a !FromBaseTable representing T1 with table number "0" and a right !ColumnReference pointing
to a different !FromBaseTable representing X1 (i.e. T1 again) with table number "1".
  
- If the optimizer then decides to use an index for T1, the table number doesn't change--the
optimizer just decides that for "the FromBaseTable whose table number is 0 we will use an
index".  In fact, once assigned, the table number for a specific FromTable remains the same
for the duration of the compilation of the statement.
+ If the optimizer then decides to use an index for T1, the table number doesn't change--the
optimizer just decides that for "the !FromBaseTable whose table number is 0 we will use an
index".  In fact, once assigned, the table number for a specific !FromTable remains the same
for the duration of the compilation of the statement.
  
   * Under what circumstances will a table number be -1?
  
- The only time a FromTable's table number can be -1 during optimization is if it is in fact
a ProjectRestrictNode with a non-FromTable subquery as its child.
+ The only time a !FromTable's table number can be -1 during optimization is if it is in fact
a !ProjectRestrictNode with a non-!FromTable subquery as its child.
  
- There are a number of classes that extend FromTable. For many of these classes, table numbers
are always set during binding (and thus will not be -1 when it comes time to optimize)
+ There are a number of classes that extend !FromTable. For many of these classes, table numbers
are always set during binding (and thus will not be -1 when it comes time to optimize).
  
- In some cases, classes which extend FromTable are only instantiated AFTER optimization has
completed, and thus even though their table numbers can be -1, that won't affect optimization.
For example,  IndexToBaseRowNode, and  SingleChildResultSetNode.
+ In some cases, classes which extend !FromTable are only instantiated AFTER optimization
has completed, and thus even though their table numbers can be -1, that won't affect optimization.
Two classes for which this is true are !IndexToBaseRowNode and !SingleChildResultSetNode.
  
- There are two subclasses of FromTable that can be instantiated during preprocessing but
that may not have their table numbers set (these are the ones of interest to the current discussion):
+ There are two subclasses of !FromTable that can be instantiated during preprocessing but
that may not have their table numbers set (these are the ones of interest to the current discussion):
  
-  * ProjectRestrictNode:
+  * !ProjectRestrictNode: If created during preprocessing, table number will only be set
in {{{FromSubquery.extractSubquery()}}}; in all other cases it will remain -1.
  
+  * !NormalizeResultSetNode: Instantiated during bind phase for !InsertNode and !UpdateNode
but table number is not set.  So it will be -1.  However, a !NormalizeResultSetNode never
appears in an optimizer's {{{optimizableList}}}, and thus such a node will never actually
be optimized.
-   If created during preprocessing, table number will only be set in
-   FromSubquery.extractSubquery(); in all other cases it will remain -1.
  
-  * NormalizeResultSetNode:
+ For details on how table numbers are used in the [http://wiki.apache.org/db-derby/LanguageOptimizer
various phases] of Derby optimization, interested readers may want to read the ReferencedTableMaps
and/or PredicatePushdown pages of this wiki.
  
-   Instantiated during bind phase for InsertNode and UpdateNode
-   but table number is not set.  So it will be -1.  However,
-   a NormalizeresultSetNode never appears in an optimizer's
-   optimizableList, and thus such a node will never actually
-   be optimized.
- 

Mime
View raw message