db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-681) Eliminate the parser's rewriting of the abstract syntax tree for queries with GROUP BY and/or HAVING clauses
Date Fri, 09 Feb 2007 22:27:06 GMT

    [ https://issues.apache.org/jira/browse/DERBY-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471852
] 

A B commented on DERBY-681:
---------------------------

I noticed that this issue is now on the bottom of the "patch available" list--and that the
patch was posted almost two months ago.  So I did a quick review and my comments are below.
 Note: The patch is out of date (not surprisingly); it would be great if a newer version could
be re-attached that incorporates the following comments (I am willing to continue the review/discussion
until the patch is committed).

Oh, and thank you for the "notes.txt" file, Manish!  I found it to be very helpful as I went
through the code changes.  In that document you wrote (at the very end):

> Also the one thing that I cannot explain in this patch: the fetch size
> in some ResultSet nodes has gone from 1 to 16. If this is serious I can
> dig in to see why this is happening. 

I think the difference between a fetch size of 1 and a fetch size of 16 comes down to the
difference between a TableScan and BulkTableScan.  I did a quick search of the codeline and
it looks like Derby will disable bulk fetching if the result set is "ordering dependent" (see
SelectNode.genProjectRestrict()) and also for certain min/max optimizations in a group by
(see the "considerPostOptimizeOptimization()" method of GroupByNode).  My guess is that your
changes have somehow made it so that we no longer need to disable bulk fetch for certain queries,
and thus you are now seeing a different fetch size.  This seems particular likely since all
of the queries that show a 1 vs 16 difference in aggregateOptimization() come under the heading
of "group by ordered on grouping columns".  So long story short, my feeling is that this is
an acceptable diff...

Hopefully someone will correct me if I'm wrong...

Also (w.r.t to "notes.txt" as attached to this issue):

  -- The notes you wrote for "Background on Group By" would be great as
  javadoc comments in the GroupByNode.addNewColumnsForAggregation() method
  (in addition to what's already there).

  -- The notes you wrote for "Having clause -> Design" would be great as
  comments in the GroupByNode.addAggregateColumns() method (perhaps just
  before the "if (havingClause != null)"...)

Other review comments (note: I haven't done much actual testing yet, I've just looked at the
code changes; I hope to do more testing of the changes next week...):

I think it would good if the following issues could be addressed before commit:

  1) FromBaseTable.java:

  -- It looks like you added an "accept()" method to FromBaseTable that
  overrides ResultSetNode.accept().  I noticed that ResultSetNode.accept()
  recursively calls "accept" on "resultColumns", but the new method in
  FromBaseTable does not.  This means that in cases where we used to accept
  visitors for base table result columns we will no longer do so.  I don't
  know what the effects of that might be, but I think that's probably not
  good.  It would perhaps be better to call "super.accept()" at the start
  of FromBaseTable.accept() and then go from there.

  2) GroupByNode.java:

  -- init() method has some lines that are commented out.  You mention
  these in your "notes.txt" file but there is no explanation as to why
  they are commented out (and not just deleted) in the file itself;
  might be good to add such comments (you could just take them from
  notes.txt).

  3) ResultColumnList.java:

  -- With the following diff:

-        int size = (size() > sourceRCL.size()) ? size() : sourceRCL.size();
+//      int size = (size() > sourceRCL.size()) ? size() : sourceRCL.size();
+        int size = Math.min(size(), sourceRCL.size());

  The original line (which is now commented out) looks like it assigns size
  to be the *max* of size() and sourceRCL.size(); but your changes make it
  use the minimum.  I looked at the code and I think your fix is correct--
  i.e. that we should be using the minimum size.  So should the other line
  just be deleted (instead of commented out)?  Also, there appears to be an
  implicit assumption that if the lists are two different sizes then the
  shorter one must correspond to the _leading_ columns of the longer one.
  If you're so inclined it might be nice to add a comment saying as much
  (to go along with your change here).

  4) ColumnReference.java:

  -- Unrelated (and perhaps accidental) code cleanup diff; better to leave
  this out of the patch.

  5) GroupByExpressionTest.java:

  -- In the "testSubSelect()" method of you added a test case that is
  identical to the one immediately preceding it (so far as I can tell).
  Was that intentional, or is there a HAVING clause missing?

The rest are nit-pick issues that I hope you might consider, though they should not block
commit of the patch:

  6) SelectNode.java:

  -- Some unrelated code cleanup; not a big deal, but as a general rule
  it detracts from the patch (makes it hard to figure out what changes
  are related to the issue and what changes are not).  Ex. See the
  bindNonVTITables() method.

  -- I wonder if we really need a new "init()" method here?  As far as I
  can tell there are only two files that currently create a SELECT node:
  DeleteNode.java and sqlgrammar.jj.  The latter uses the new init()
  method, the former uses the old one--but the only difference is the
  presence of the "havingClause" argument.  DeleteNode already passes
  in several null values (with associated comments), so would it make
  sense to just add a null for "havingClause", as well?  Then we would
  only need the one "init()" method for SelectNode.  I admit, though,
  that this a stylistic detail and the code as you have it is correct.
  So if you prefer to leave it as it is, that's fine--although it might
  might be good to add some minimal javadoc to the new init() method
  (ex. "@param havingClause The HAVING clause, if any", to keep in line
  with the old init() method).

  7) GroupByNode.java:

  -- Unnecessary import of JBitSet and Properties.

  -- Javadoc for "init()" method doesn't mention "nestingLevel"

  -- Looks like the following comment was deleted:

-   // finally reset gbc to its new position.

    Was that intentional?

  8) GroupByList.java:

  -- Might be nice if you could add some comments for the following line:

+    selectRCL.setCountMismatchAllowed(true);

    i.e. maybe explain briefly why count mismatches are allowed for GroupBy?

  9) ResultColumnList.java:

  -- With your changes there is now the following "if" statement in "propagateDCLInfo()":

    /* Do both lists, if supplied by user, have the same degree? */
    if (derivedRCL.size() != size() &&
        ! derivedRCL.getCountMismatchAllowed())
    {
        if (visibleSize() != derivedRCL.size()) {
            throw StandardException.newException(SQLState.LANG_DERIVED_COLUMN_LIST_MISMATCH,
tableName);
        }
    }

  Could this be rewritten to:

    /* Do both lists, if supplied by user, have the same degree? */
    if (derivedRCL.size() != visibleSize() &&
        ! derivedRCL.getCountMismatchAllowed())
    {
        throw StandardException.newException(SQLState.LANG_DERIVED_COLUMN_LIST_MISMATCH, tableName);
    }

As I said I haven't looked much at the new test cases nor have I done much testing myself.
 I will hopefully have time to do more of that next week...

> Eliminate the parser's rewriting of the abstract syntax tree for queries with GROUP BY
and/or HAVING clauses
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-681
>                 URL: https://issues.apache.org/jira/browse/DERBY-681
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Rick Hillegas
>         Assigned To: Manish Khettry
>         Attachments: 681.patch.txt, notes.txt
>
>
> If a query contains a GROUP BY or HAVING clause, the parser rewrites the abstract syntax
tree, putting aggregates into a subselect and treating the HAVING clause as the WHERE clause
of a fabricated outer select from the subquery. This allows the compiler to re-use some machinery
since the HAVING clause operates on the grouped result the way that the WHERE clause operates
on the from list. Unfortunately, this rewriting creates an explosion of special cases in the
compiler after parsing is done. The rewriting is not systematically handled later on in the
compiler. This gives rise to defects like bug 280. We need to eliminate this special rewriting
and handle the HAVING clause in a straightforward way. This is not a small bugfix but is a
medium sized project.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message