db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-2998) Add support for ROW_NUMBER() window function
Date Mon, 25 Feb 2008 21:52:54 GMT

    [ https://issues.apache.org/jira/browse/DERBY-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572260#action_12572260

A B commented on DERBY-2998:

Hi Thomas, thanks for your continued work on this.

> Sorry about that :|

Not a problem :)

I haven't had a chance to look at the failing queries that you refer to in test7, but I did
take a look at patch 15.  Some comments/questions after my first run through are below.

1)  There are a lot of places with logic similar to the following:

  ResultColumn rc = (ResultColumn) resultColumns.elementAt(index);
  if (rc.getExpression() instanceof WindowFunctionColumnNode)

Since WindowFunctionColumnNode extends ResultColumn, I wonder if it would be worth it to add
a new "isWindowColumn()" method (or something like that) to ResultColumn.java and let that
method contain all of the necessary logic for finding a WindowFunctionColumnNode.  If that
was done then the above type of "if" expression becomes simply:

  ResultColumn rc = (ResultColumn) resultColumns.elementAt(index);
  if (rc.isWindowColumn())

which makes it easier to extend the check for "window columns" in the future if needed.  For
example, would it ever be necessary to walk further down the ResultColumn's expression chain
to look for a RowNumber column?  That is, could we ever have:

     -> VirtualColumnNode
        -> ResultColumn
          -> ...
            -> RowNumberColumnNode

If so, then the "isWindowColumn()" as defined in ResultColumn.java could include the necessary
logic to walk the chain, which would simplify the calling code.  See, for example, comments
#5 and #6 below.


In WindowNode.bind() there is the following:

  /* At this stage there is hopefully a PRN below us as the fromList */
  return this;

It might be good to add an ASSERT here to make sure that the 'hopefully' condition is satisified...


Many of the fields in WindowNode (and other new classes added by this patch) are "public",
but some of them are never referenced outside of WindowNode. Unless a "public" declaration
is functionally necessary, it seems like it would be safer to make them private and then supply
the necessary getter/setter methods where needed, to avoid accidental modification.


In WindowNode.java there is a field "level" which is initialized to -1 and then is set from
SelectNode.java.  But as far as I can tell there are no comments explaining what that field
does.  I think I was able to figure it out from the code, but it'd be nice to have explanatory
comments within WindowNode itself.  And per comment #3 above, I think it'd be nice to use
a setter method in SelectNode instead of direct assignment to the public field.


In ColumnReference.java there are two different kinds of checks to see if the ColumnReference
is pointing to a WindowColumnNode.  The first is via the method "pointsToWindowFunction()",
which does an instanceof check on the source ResultColumn's expression and returns true if
it is a WindowFunctionColumnNode.  The second is in the categorize() method, where the logic
only evaluates to true if the source ResultColumn's expression is a *VirtualColumnNode* whose
source ResultColumn's expression is a WindowFunctionColumnNode.  So the checks do not appear
to be equivalent.  Is that intentional?  If so, then explanatory comments might be good here.
 Also, I believe this is a good example of the kind of code that might benefit from an "isWindowColumn()"
function on ResultColumn, per comment #1 above.


In ResultColumnList.java, there is the following:

    /* Window function columns are added by the WindowResultSet */
    if (sourceExpr instanceof WindowFunctionColumnNode ||
        (sourceExpr instanceof VirtualColumnNode &&
            instanceof WindowFunctionCumnNode)))

See comment #1 above for an alternate (cleaner?) approach...


The result set statistics node for WindowResultSet does not currently print out the name of
the result set...(I don't think).  This is pretty minor, but I was briefly confused when I
saw the RealWindowResultSetStatistics class and yet the query plans weren't showing anything
"window" releated.


If I'm understanding correctly, the execution-time logic for WindowResultSet operates in a
fashion similar to ProjectRestrictResultSet.  That is, it iterates through all of its source's
rows and, for each one, it applies a restriction.  If the row fails the restriction, then
the iteration moves on to the next source row, and so on.  This agrees with the following
comments, pulled from the getNextRowCore() method of WindowResultSet:

    * Loop until we get a row from the source that qualifies, or there are
    * no more rows to qualify. For each iteration fetch a row from the
    * source, and evaluate against the restriction if any.

If this is correct, then I'm not entirely sure we'll get the performance boost that one might
perhaps expect from a query like:

      SELECT row_number() over () as r, t.* FROM T
    ) AS tmp WHERE r <= 3;

I made a similar comment on Februrary 5th that was based on materialization of the source
result set, and that comment ended up being wrong because I misunderstood materialization.
 But in looking at this query again, I think something might still be slightly sub-optimal

If we take the approach of "loop until we get a row that qualifies", then we will loop through
_all_ of the rows in table T, even though only the first 3 satisfy the restriction.  In the
end we will correctly return the first 3 rows (and only the first 3 rows) but not before we've
read all of the others from disk and applied the "<= 3" restriction to each one.  This
can be observed by either a) adding debug printouts to WindowResultSet.getNextRowCore() for
every call to "source.getNextRowCore()", or b) looking at the query plan.  For the above query,
the plan shows the following for the top-most ProjectRestrictResultSet:

  ******* Project-Restrict ResultSet (1):
  Number of opens = 1
  Rows seen = 1280
  Rows filtered = 1277
  restriction = true

So we actually read all 1280 rows from disk, then filtered 1277 of them out so that, in the
end, we only returned 3 rows.  Thus from a performance perspective the ROW_NUMBER() function
does not appear to improve things.  Is that a known limitation of the current development
(which, for the record, would be perfectly acceptable)?

Having said that, there is absolutely _NO_ need for you to boost performance before committing
the changes :)  Incremental development is good and it's great to have the functionality working.
 Performance improvements can always be addressed later, if you (or anyone else) so chooses.
 I'm just raising the issue to make sure that I understand how things are working.


For the query mentioned in comment #8 above, the query plan shows:

  ProjectRestrictResultSet (sees ALL rows, restricts down to 3)
    -> ProjectRestrictResultSet (sees ALL rows)
      -> WindowResultSet (sees ALL rows)

I intuitively expected the restriction to appear at the level of the WindowResultSet, as opposed
to appearing at the level of the top-most PRN.  The plan as shown above makes it look like
the restriction is not being enforced by the Window ResultSet, but is instead being enforced
by the ProjectRestrict. Is that correct?  Or is this just a case where the query plan fails
to capture what is really happening with the WindowResultSet?  I'm guessing the latter, but
am not sure...

> Add support for ROW_NUMBER() window function
> --------------------------------------------
>                 Key: DERBY-2998
>                 URL: https://issues.apache.org/jira/browse/DERBY-2998
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Thomas Nielsen
>            Assignee: Thomas Nielsen
>            Priority: Minor
>         Attachments: d2998-10.diff, d2998-10.stat, d2998-11.diff, d2998-12.diff, d2998-12.stat,
d2998-13.diff, d2998-13.stat, d2998-14.diff, d2998-14.stat, d2998-15.diff, d2998-15.stat,
d2998-4.diff, d2998-4.stat, d2998-5.diff, d2998-5.stat, d2998-6.diff, d2998-6.stat, d2998-7.diff,
d2998-7.stat, d2998-8.diff, d2998-8.stat, d2998-9-derby.log, d2998-9.diff, d2998-9.stat, d2998-doc-1.diff,
d2998-doc-1.stat, d2998-test.diff, d2998-test.stat, d2998-test2.diff, d2998-test2.stat, d2998-test3.diff,
d2998-test3.stat, d2998-test4.diff, d2998-test4.stat, d2998-test6.diff, d2998-test7.diff
> As part of implementing the overall OLAP Operations features of SQL (DERBY-581), implement
the ROW_NUMBER() window function.
> More information about this feature is available at http://wiki.apache.org/db-derby/OLAPRowNumber

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message