db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "OLAPRowNumber" by ThomasNielsen
Date Tue, 05 Feb 2008 12:02:49 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by ThomasNielsen:
http://wiki.apache.org/db-derby/OLAPRowNumber

------------------------------------------------------------------------------
  The ROW_NUMBER() window function is currently not implemented, and thereby not supported.
  
  == Proposed Changes ==
+ 
+ 
  ''DRAFT''
  
+ === Overview ===
+ The changes involved to implement ROW_NUMBER() will touch several areas of the Derbys engine.
This work includes getting some of the framework needed for future window function extensions
in place.
- The steps we propose taking are
-  * The new syntax must be added to the SQL grammar 
-  * A {{{RowNumberColumnNode}}} is added to the querytree for every instance of {{{the ROW_NUMBER()}}}
in the query during compilation
-  * During code generation in {{{ResultColumnList.generateCore()}}} we trap instances of
{{{RowNumberColumnNode}}} in the querytree and generate code to virtually invoke {{{BaseActivation.getSetRowNumberValue()}}}
-  * During execution of the generated code {{{BaseActivation.getSetRowNumberValue()}}} handles
the incrementation and caching of rownumber values based for a given column.
-  * Optimization should cause the execution to stop once possible {{{WHERE}}} clauses are
satisfied.
  
+ The largest changes will likely be:
+  * The new syntax must be added to the SQL grammar.
+  * New query tree node subclasses must be added for the window specification and window
function columns.
+  * Optimization should avoid flattening and enable materialization.
+  * A new result set class implementing the window functions must be added.
+  * Statistics gathering must be implemented.
+ 
+ Implementation of the ROW_NUMBER() window function is likely to affect other areas as well
but to a lesser extent, often only as helper methods.
+ 
+ For the first increment we will only implement support for empty, unnamed windows. This
means a window spanning the complete result set.
+ 
+ === Query processing ===
+ 
+ The new ROW_NUMBER() syntax is added to the SQL grammar in {{{sqlgrammar.jj}}} to support
empty, unnamed windows.
+ 
+ While walking the ResultColumns of a SelectNode, we add a RowNumberColumnNode to the ResultColumnList
in the AST. The window specification is attached as a WindowNode below the RowNumberColumnNode.
This is analogous to how a FROM list, a ORDER BY clause, etc, is done for the SelectNode itself.
The WindowNode takes a set of parameters including the name, partition definition, ordering
info, and the window frame definition. The WindowNode should serve as a basis for future support
of named windows.
+ 
+ The interesting part of the AST looks like this for a SelectNode with a window function
once parsing is completed:
+ 
+ {{{
+ ...
+  |
+ SelectNode
+  * ...
+  * orderByList -> ...
+  * resultColumnList -> * ResultColumn : table T, column A
+  |                     * ...
+  |                     * RowNumberColumnNode -> WindowNode
+  |                     * ...
+  | 
+ ...
+ }}}
+ 
+ === Query Optimization ===
+ 
+ ==== Avoid flattening ====
+ We avoid flattening queries with window functions during optimization.
+ The main rationales behind this are
+  * avoid rewrite of the query
+ 
+ {{{
+    SELECT * FROM (
+           SELECT row_number() over () as r, t.* FROM T
+    ) AS tmp WHERE r <= 3;
+ }}}
+ 
+ into
+ 
+ {{{
+    SELECT row_number() over () as r, t.* FROM T WHERE R <= 3;
+ }}}
+ 
+ which will fail as '''r''' is not a column in table '''T'''. 
+ 
+  * window function results that span multiple rows, like a moving average, will benefit
from materialization. For the nested select query above we materialize the subquery select
result, and have the outer SELECT pull rows from the materialized result.
+ 
+ ==== Modification of access paths ====
+ At the last stages of optimization the AST is made into a queryplan by modification of access
paths. 
+ 
+ At this stage a SelectNode is exchanged for a ProjectRestrictNode (PRN) that handles projection
and restriction, and possible index use is considered etc. 
+ The SelectNode clauses ORDER BY, GROUP BY, and FROM are made into their respective nodes
in the queryplan. See example.
+ 
+ Given this select node in the AST
+ 
+ {{{
+ ...
+  |
+ SelectNode
+  * ...
+  * From -> ...
+  * OrderBy -> ...
+  * GroupBy -> ...
+  * Distict
+  * Where -> ...
+  * resultColumnList -> * ResultColumn: table T, column A
+  |                     * ...
+  |                     * RowNumberColumnNode -> WindowNode
+  |                     * ...
+  | 
+ ...
+ }}}
+ 
+ this is made into the following queryplan:
+ 
+ {{{
+ ProjectRestrictNode
+  |
+ WindowNode -> ...
+  |
+ OrderByNode -> ...
+  |
+ DistinctNode -> ...
+  |
+ GroupByNode -> ...
+  |
+ ProjectRestrictNode
+  |
+ <from table T>
+ }}}
+ 
+ A PRN is inserted on top of whatever is in the select FROM clause. This PRN does projection
of the non-window function columns, and applies any restriction (where clause) on these columns.
To the top of this PRN we add any GroupByNode, DistinctNode, and OrderByNode as needed.
+ 
+ In the case of a window function column we then pull the WindowNode up from under the RowNumberColumnNode,
and merge this with the result we get from below. The WindowNode needs to go on top so that
ordering, grouping etc are performed properly. We add one WindowNode for each window function
column, and they are evaluated left to right in the result column list, with the right most
column being the top WindowNode in the query plan.
+ 
+ Yet another PRN is used to top off the upper WindowNode. This PRN is effectively a no-op,
and is eliminated during code generation. The PRN is mainly there due to other parts of the
code expecting the top node to be a PRN once we are done with the access path modification.
+ 
+ === Code generation === 
+ 
+ During code generation, we lay out code to generate a WindowResultSet, and pass the source
resultset information as well as column info to it. The logic for ROW_NUMBER() over the empty
window specification is simply a counter in the WindowResultSet, so this change should be
seen as part of the window function framework. The intended use of this result set is mainly
other window functions that may need row caching to calculate moving averages over a window
or other features and functionallity needed for the window function being implemented.
+ 
+ === Code execution ===
+ 
+ During code execution the WindowResult behaves as a regular part of the ResultSet chain
in the engine. Rows are fetched from this ResultSet with a call to getNextRowCore() which
in turn fetches rows from the lower/child ResultSet in similar fashion. The window function
column is added, and the row passed up the chain.
+ 

Mime
View raw message