db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manish Khettry" <manish.khet...@gmail.com>
Subject Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.
Date Tue, 25 Jul 2006 23:02:00 GMT
On 7/24/06, Army <qozinx@gmail.com> wrote:
> Hi Manish,
> I have to be honest and say that I haven't taken the time to fully
> understand
> the "guts" of this patch--namely, the part where you rewrite the group-by
> tree
> in GroupByNode.addAggregateColumns().  But I did do some review of the
> overall
> patch and I have the following comments.  If you can address these I think
> the
> resultant patch will be a lot easier to read (and will also be smaller :)
> so
> that I or anyone else reviewing can focus more on the "guts" of what this
> patch
> is really doing...
> ------------------------------------------------------------------
> 0) Patch doesn't apply cleanly with latest codeline--there appear to be
> conflicts with the JUnit test changes and also one conflict in
> sqlgrammar.jj.  I
> manually resolved these conflicts in my local codeline so that I could run
> some
> tests/view the code, but I think it'd be appropriate to post an updated
> patch
> (sync'd with the latest codeline) to ease the review process for others.

I have updated the bug with a new patch after resolving the conflict.

1) TernarnyOpNode.isEquivalent(): I think we need to check "receiver"
> equivalence as well, in addition to left and right operands.  For example,
> I did
> the following and I received an error as expected:

yes, you are right on this one. The new patch fixes this.

2) Aren't we losing information with this change?
> In VerifyAggregateExpressionsVisitor.java:
> -                               throw
> StandardException.newException(
> cr.getSQLColumnName());
> +                               throw
> StandardException.newException(SQLState.LANG_INVALID_GROUPED_SELECT_LIST);
> Before your changes we see the column reference name:
> ij> select substr(vc, 3, 4) from s1 group by vc2;
> ERROR 42Y36: Column reference 'VC' is invalid.  For a SELECT list with a
> BY, the list may only contain grouping columns and valid aggregate
> expressions.
> But after the patch is applied, we lose the name of the invalid column
> reference:
> ij> select substr(vc, 3, 4) from s1 group by vc2;
> ERROR 42Y30: The SELECT list of a grouped query contains at least one
> invalid
> expression. If a SELECT list has a GROUP BY, the list may only contain
> valid
> grouping expressions and valid aggregate expressions.
> Generally it seems like a good idea to provide the user with as much
> information
> as possible for error conditions like this.  Is it possible to preserve
> the
> result column name in cases like this (i.e. cases where we do actually
> know what
> the invalid column reference is)?

Actually the existing error message no longer makes sense with group by
expressions. For example, the line of code that you point to, also catches
errors like this:
 select c1+1,  avg(c2) from test group by c1+10;
Does the existing error message make sense for this query?

3) In UnaryOperatorNode.java the check for equivalence of the operand has
> three
> parts to it:
> +       return (operator.equals(other.operator)
> +                       || (operand == other.operand)
> +                       || ((operand != null) && operand.isEquivalent
> (other)));
> But for other classes--such as BinaryOperatorNode.java--the check is only
> a
> single call to "isEquivalent" for each operand:
> +               return methodName.equals(other.methodName)
> +                      && leftOperand.isEquivalent(other.leftOperand)
> +                      && rightOperand.isEquivalent(other.rightOperand);
> Can you explain why the operand for UnaryOperatorNode has three checks
> (.equals(), "==", and .isEquivalent()) while the operands for
> BinaryOperatorNode
> only have a single check (.isEquivalent())?  Some comments to describe the
> difference might be useful in these classes.

If you look at bindUnaryOperator for UnaryOperatorNode, you will see that
it is possible for operand to be null; not so in BinaryOperatorNode.

3) I think there are several places in this patch where you have comments
> explaining "what" by not explaining "why"--and I think the latter would be
> helpful.  For example:
> In VerifyAggregateExpressionsVisitor.java:
> Why do we disallow expressions involving native java computation in the
> following code?
> +    } else if (node instanceof JavaToSQLValueNode)
> +    {
> +      // disallow any expression which involves native java computation.
> +      throw StandardException.newException( (groupByList == null) ?

It is not possible to compare java expressions for equivalence. I will
update the comments.

Why do we not visit children under subqueries in the following code?
>         /**
> -        * Don't visit children under an aggregate
> +        * Don't visit children under an aggregate, subquery or any node
> which
> +        * is equivalent to any of the group by expressions.
> In SelectNode.java:

There is no code change here-- I just updated the comment to reflect
existing behavior. Honestly, I do't know why we do not visit subqueries.

Why do we need to add a call to preprocess in the following code?
> +    if (groupByList != null)
> +    {
> +      groupByList.preprocess(numTables, fromList, whereSubquerys,
> wherePredicates);
> +               }
> +
> For this one you can probably just use the comments that you already
> included in
> your patch description, namely: "This is needed because expressions can
> get
> rewritten in the select list but not in the grouping list causing problems
> when
> the result set is generated."  It'd be great if this explanation was
> included in
> the actual code.

Makes sense. done.

In GroupByNode.java:
> Why do we always say that the source is not in sorted order if we have an
> expression instead of a ColumnReference?
> -        crs[index] = gc.getColumnReference();
> +        if (gc.getColumnExpression() instanceof ColumnReference)
> +        {
> +          crs[index] = (ColumnReference)gc.getColumnExpression();
> +        }
> +        else
> +        {
> +          isInSortedOrder = false;
> +          break;
> +        }

If you look at how isInSortedOrder is used or look at the javadoc for
ResultSet#isOrderedOn, you will see that this optimization applies only for
column references.

Also, in the comment you posted to DERBY-883 with the patch you explained
> how
> the rewrite is different now:
> > o the rewrite logic is a bit different now. Earlier, we would change
> > each unaggregate ColumnReference in the select list and point it to the
> > GroupByColumn. Now we replace each group by expression that we find
> > in the projection list with a virtual column node that effectively
> points
> > to a result column in the result set doing the group by. In addition
> > the original routine which did the rewrite is now split up into two
> separate
> > and smaller routines: addUnAggColumns and addAggregateColumns.
> It would be great if this explanation could be added to the relevant parts
> in
> the code and/or if the comments for that section could be updated to match
> the
> new behavior.  This will make it easier for people to understand what is
> going
> on when they read the code several years from now.

OK,  done.

In GroupByList.java:
> Why don't we need to add a ResultColumn/ColumnReference pair to
> SelectNode's RCL
> when the grouping column is an expression (as opposed to a
> ColumnReference), as
> in the following code?
> @@ -185,7 +174,8 @@
>      /* If no match found in the SELECT list, then add a matching
>       * ResultColumn/ColumnReference pair to the SelectNode's RCL.
>       */
> -       if (! matchFound)
> +       if (! matchFound &&
> +           groupingCol.getColumnExpression() instanceof ColumnReference)
>         {
>                 ResultColumn newRC;

That is a good question-- this bit of code add invisible column references
when a column reference in the group by list, is not in the projection list;

select c2, sum(c3) from test group by c1;

will add c1 to the projection list as a generated column. I'm not sure if
you want to do the same kind of rewrite for expressions. I thought this
wasn't worth doing but I'm not sure anymore.

These are just some examples; I think there are others too.  If you can look
> again at the patch and try to add comments explaining *why* we do or do
> not need
> to do things like this, that would be a huge help to those reading the
> code now
> and in the future.  These don't have to be extensive--even just a line or
> two
> can be very helpful.
> 4) In GroupByColumn and GroupByList you have deleted a bunch of methods
> that
> apparently are unused.  While this is a useful thing to do, I don't know
> if we
> want to do that as part of the DERBY-883 patch--it makes the diff bigger
> and
> doesn't add to the patch functionally.  Instead you could file a new Jira
> entry
> and attach a separate patch that removes the unnecessary methods to that
> Jira
> issue, to be reviewed/committed separately.  Or you could create a
> follow-up
> patch for this issue that just separates out the changes for removing
> these methods.

Most if not all of the routines which are deleted were made redundant by
this fix. For example, getColumnReference in GroupByColumn is not only
unused it makes no sense with group by expressions. I think its good to get
rid of this code but if you want, I'd be perfectly willing to add dead code
back in the source tree.

I disagree with you here-- when a contributor has spent a considerable
amount of time understanding and making a lot of changes to a class, it
makes sense to remove dead code at the same time. If you don't, stuff like
this will likely get ignored or slip through the cracks, imho.

5) Extraneous diff in QueryTreeNode.java?  Are any of these changes actually
> required or are they just there for debugging?  If they are an intentional
> part
> of the patch then a) comments explaining these changes would be helpful,
> and b)
> it might be good to actually delete the commented out lines altogether,
> instead
> of leaving them in to clutter the code.

I've removed this from the new patch.

6) New methods could use javadoc explaining in more detail what they do:
> GroupByNode.addUnAggColumns()
> GroupByNode.addAggregateColumns()

OK, done.

7) There are a couple of places where you have commented lines out instead
> of
> deleting them.  If there's a reason you left these in, then comments
> explaining
> that would be great; otherwise I think it's cleaner to just delete the
> lines
> altogether.  For example:
> In GroupByColumn.java:
> @@ -48,7 +48,8 @@
>          */
>         public void init(Object colRef)
>         {
> -               this.colRef = (ColumnReference) colRef;
> +               //this.colRef = (ColumnReference) colRef;
> +               this.columnExpression = (ValueNode)colRef;
>         }
>         /**

yes, you're right. I've removed it.

8) I think the diff for master/subquery.out has some stuff from a failed
> merge
> in it:

yes, this is an old diff hanging around; removed it.

> A. Unnecessary white-space changes?

Atleast some of these were done to improve existing indentation. Next time,
I'll resist the urge to improve readability.

> B. Lines longer than 80 chars:

I try hard to to keep things within 80 chars with tab stop set to 4 chars.
Do you see any changes where this is not the case?

Thanks for taking the time to go through these changes. I've attached a new
patch which incorporates most, if not all of your feedback. Other minor
changes from the first patch-- moved the junit test case to a different
package, and added more javadoc comments to the new method in
BaseJDBCTestCase and made the indentation conform to the one in this file
which does not seem to use tabs.


View raw message