db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manish Khettry" <manish.khet...@gmail.com>
Subject Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.
Date Fri, 28 Jul 2006 21:14:54 GMT
On 7/26/06, Army <qozinx@gmail.com> wrote:
>
> Hi Manish,
>
> Thanks for addressing my comments.  The patch is looking better
> now--especially
> with the whitespace issues in version 4.   Some follow-up comments below.
>
> Manish Khettry wrote:
> >> 1) TernarnyOpNode.isEquivalent (): I think we need to check "receiver"
> >> equivalence as well, in addition to left and right operands.
> >
> > yes, you are right on this one. The new patch fixes this.
>
> Thanks for fixing this.  If possible, could you add a test case to your
> JUnit
> test to demonstrate?  Could just be the query that I posted in my previous
> comments.


I have added a couple of queries to make sure that
TernaryOpNode#isEquivalent is tested.


> Actually the existing error message no longer makes sense with group by
> > expressions. For example, the line of code that you point to, also
> catches
> > errors like this:
> > select c1+1,  avg(c2) from test group by c1+10;
> > Does the existing error message make sense for this query?
>
> Not with this query, no.  But two things to note here:
>
> 1) Before your patch, if you replace "group by c1+10" with "group by c2"
> you'll
> get error 42Y36 saying that column reference C1 is invalid, even though
> it's
> actually part of an expression ("c1+1")--so that type of behavior *does*
> already
> exist, even if it doesn't really "make sense".  If you change the SQLSTATE
> and
> error message for this case, the result is a change (albeit a pretty small
> one)
> in customer apps, which is really the issue here.
>
> 2) The key expression in my previous comment was "in cases like this",
> where
> "like this" meant queries where the existing error message *does* make
> sense.
>
> For example, with the following query it makes sense to include the column
> reference because we know what it is:
>
> ij> select c1 from test group by c2;
> ERROR 42Y36: Column reference 'C1' is invalid.  For a SELECT list with a
> GROUP
> BY, the list may only contain grouping columns and valid aggregate
> expressions.
>
> And that's what we'll get before your patch.  But after your patch we a)
> get a
> different SQLSTATE and b) lose the column reference name:
>
> ij> select c1 from test group by c2;
> ERROR 42Y30: The SELECT list of a grouped query contains at least one
> invalid
> expression. If a SELECT list has a GROUP BY, the list may only contain
> valid
> grouping expressions and valid aggregate expressions.
>
> So now we've changed the way an existing query behaves (by returning a
> different
> SQLSTATE)--which (I think?) means we need to mark this as "Existing
> Application
> Impact"--*and* we're now providing the user with less information.


I see your point but I would prefer to keep it this way, first because the
error message in question 42Y36-- even though it gives more information, is
misleading; specifically this phrase: "the list may only contain grouping
columns .....". I also don't like error handling for a subset of the cases
taking recourse to instanceof. I'll throw it up to the community to weigh
in.

I can see that it would be useful to point out which expression is invalid
as opposed to saying ".. atleast one invalid expression....". If the
community wants this, then I'd like to be done with this patch and file
another issue to specifically improve the error handling of this case.

If it's possible to figure out when we have a simple column reference (c1)
> verses an expression (c1+1) and to throw a corresponding error, I think
> that
> would be ideal.  But if you decide that it's not possible or not worth it
> to
> preserve the error for queries with simple column references, then feel
> free to
> leave this as it is.  If that's your decision, though, I think you should
> ask
> derby-dev if a different SQLSTATE for the same query counts as "Existing
> Application Impact", and if so, mark this issue as appropriate.
>
> >> + return (operator.equals(other.operator)
> >> +        || (operand == other.operand)
> >> +        || ((operand != null) && operand.isEquivalent(other)));
> >
> > If you look at bindUnaryOperator for UnaryOperatorNode, you will see
> that
> > it is possible for operand to be null; not so in BinaryOperatorNode.
>
> So the assumption here is that if (operand == other.operand) then both
> operands
> are null?  If so, a simple comment saying that could be helpful.


OK. I think the code speaks clearly for itself (the second part of the ||
checks for != null before calling isEquivalent) but I've added the comment.
I like to add comments if they really clarify something that the code may
not. Usually the code changes and the comments don't and thats worse than no
comments at all. Those XP guys were onto something!

> In GroupByList.java:
> <snip code snippet>
> >
> > That is a good question-- this bit of code add invisible column
> references
> > when a column reference in the group by list, is not in the projection
> > list;
> > i.e.
> >
> > select c2, sum(c3) from test group by c1;
> >
> > will add c1 to the projection list as a generated column. I'm not sure
> if
> > you want to do the same kind of rewrite for expressions. I thought this
> > wasn't worth doing but I'm not sure anymore.
>
> I just tried this query with and without your patch and it fails in both
> cases
> with error 42Y36/42Y30.  So I guess I'm a little confused as to why Derby
> would
> add "c1" to the projection list and what good it does us?



Sorry, wrong query, try

select c2, sum(c3) from test group by c1,c2;

If you're not sure whether or not it's worth adding equivalent logic for
> expressions, do you plan to follow-up with this later?  If not, a comment
> explaining what you have found (or what your uncertainties are) might help
> other
> developers down the road (esp. if a bug comes up in this area later).


I'll add a comment. My preference would be to leave as is and evaluate it
later.


> > I disagree with you here-- when a contributor has spent a considerable
> > amount of time understanding and making a lot of changes to a class, it
> > makes sense to remove dead code at the same time. If you don't, stuff
> like
> > this will likely get ignored or slip through the cracks, imho.
>
> I didn't mean to say that the changes were not useful nor that they
> shouldn't be
> committed; just that it might make it easier to track "cleanup" vs
> "functional"
> changes by filing a separate Jira issue to remove the dead code and
> posting an
> isolated patch to that issue.  The changes will still go in--no slipping
> through
> the cracks--but they will go in as part of their *own* Jira commit, which
> seems
> cleaner to me.
>
> In any event, that was just a suggestion.  No need to let this prevent the
>
> commit of the patch.


OK, I see your point although the overhead of a new jira issue, code reviews
and such is a bit high, imho.

> 7) There are a couple of places where you have commented lines out instead
> >
> > yes, you're right. I've removed it.
>
> There are still a couple of these lingering:


I got rid of both of them.

>> A. Unnecessary white-space changes?
> >
> > Atleast some of these were done to improve existing indentation. Next
> time,
> > I'll resist the urge to improve readability.
>
> Thanks for cleaning a bunch of these whitespace diffs up in the version
> four
> patch.  That said, there are still quite a few white-space changes
> left--did you
> intentionally leave those in?  Ex.
>
> GroupByNode.java:
>
> @@ -470,10 +489,10 @@
>                         ** bottom project restrict.
>                         */
>                         newRC = (ResultColumn) getNodeFactory().getNode(
> -
> C_NodeTypes.RESULT_COLUMN,
> -
> "##aggregate result",
> -
> aggregate.getNewNullResultExpression(),
> -
> getContextManager());
> +                                       C_NodeTypes.RESULT_COLUMN,
> +                                       "##aggregate result",
> +
> aggregate.getNewNullResultExpression (),
> +                                       getContextManager());
>                         newRC.markGenerated();
>                         newRC.bindResultColumnToExpression();
>                         bottomRCL.addElement (newRC);


Maybe one or two slipped by-- this one does reduce the indentation and its
possible it was going past the right margin and I decided to fix it. Some of
the others really served no purpose and I got rid of them.

1 -- SubstituteExpressionVisitor.java:
>
>   * The copyright says 1998, 2004; I think this is supposed to just say
> 2006
> since that's when the file was created.


OK.

2 -- I see that in version 4 of the patch you updated UnaryOperatorNode to
> have
> the following logic in isEquivalent():
>
> +               UnaryOperatorNode other = (UnaryOperatorNode)o;
> +               return (operator.equals(other.operator) &&
> +                       ((operand == other.operand)||
> +                        ((operand != null) && operand.isEquivalent(
> other.operand))));
>
> Did you become aware of this problem because of some test(s) that you were
> running?  If so, it'd be great if you could add the actual test case(s) to
> your
> JUnit test for regression purposes.  The more tests there are to check for
> these
> kinds of subtleties, the better...


Not tests, just visual inspection. I was going over the email exchange when
it hit me. I thought about tests and the only one case where the operand is
null is count(*)-- since isEquivalent will only be called for grouping
expressions, we can never exercise this particular branch of the
conditional.

And finally:
>
> > Manish Khettry updated DERBY-883:
> > ---------------------------------
> >
> >     Attachment: 883.patch4.txt
> >
> > I found a problem in one of my changes. Also got rid of a lot of
> > white space only diffs.
>
> In the future it would be helpful if you could say what changed between
> versions
> of the patch, so that reviewers don't have to scour the patches to try to
> figure
> out what changed.  I managed to find out what "a problem" meant in this
> case,
> but it's generally a good idea to explicitly state the problem/changes in
> the
> comments.  Makes my life as a reviewer easier :)


OK. I will attach a  new patch shortly incorporating your comments and the
NPE that Yip found. Also, just as an fyi-- I will be away all of next week,
so if you have additional comments, I can only get to them, the second week
of august.

m

Mime
View raw message