db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@gmail.com>
Subject Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.
Date Wed, 26 Jul 2006 21:12:14 GMT
Hi Manish,

Thanks for addressing my comments.  The patch is looking better now--especially 
with the whitespace issues in version 4.   Some follow-up comments below.

Manish Khettry wrote:
 >> 1) TernarnyOpNode.isEquivalent(): I think we need to check "receiver"
 >> equivalence as well, in addition to left and right operands.
 > yes, you are right on this one. The new patch fixes this.

Thanks for fixing this.  If possible, could you add a test case to your JUnit 
test to demonstrate?  Could just be the query that I posted in my previous comments.

> Actually the existing error message no longer makes sense with group by
> expressions. For example, the line of code that you point to, also catches
> errors like this:
> select c1+1,  avg(c2) from test group by c1+10;
> Does the existing error message make sense for this query?

Not with this query, no.  But two things to note here:

1) Before your patch, if you replace "group by c1+10" with "group by c2" you'll 
get error 42Y36 saying that column reference C1 is invalid, even though it's 
actually part of an expression ("c1+1")--so that type of behavior *does* already 
exist, even if it doesn't really "make sense".  If you change the SQLSTATE and 
error message for this case, the result is a change (albeit a pretty small one) 
in customer apps, which is really the issue here.

2) The key expression in my previous comment was "in cases like this", where 
"like this" meant queries where the existing error message *does* make sense.

For example, with the following query it makes sense to include the column 
reference because we know what it is:

ij> select c1 from test group by c2;
ERROR 42Y36: Column reference 'C1' is invalid.  For a SELECT list with a GROUP 
BY, the list may only contain grouping columns and valid aggregate expressions.

And that's what we'll get before your patch.  But after your patch we a) get a 
different SQLSTATE and b) lose the column reference name:

ij> select c1 from test group by c2;
ERROR 42Y30: The SELECT list of a grouped query contains at least one invalid 
expression. If a SELECT list has a GROUP BY, the list may only contain valid 
grouping expressions and valid aggregate expressions.

So now we've changed the way an existing query behaves (by returning a different 
SQLSTATE)--which (I think?) means we need to mark this as "Existing Application 
Impact"--*and* we're now providing the user with less information.

If it's possible to figure out when we have a simple column reference (c1) 
verses an expression (c1+1) and to throw a corresponding error, I think that 
would be ideal.  But if you decide that it's not possible or not worth it to 
preserve the error for queries with simple column references, then feel free to 
leave this as it is.  If that's your decision, though, I think you should ask 
derby-dev if a different SQLSTATE for the same query counts as "Existing 
Application Impact", and if so, mark this issue as appropriate.

>> + return (operator.equals(other.operator)
>> +        || (operand == other.operand)
>> +        || ((operand != null) && operand.isEquivalent(other)));
> If you look at bindUnaryOperator for UnaryOperatorNode, you will see that
> it is possible for operand to be null; not so in BinaryOperatorNode.

So the assumption here is that if (operand == other.operand) then both operands 
are null?  If so, a simple comment saying that could be helpful.

> In GroupByList.java:
<snip code snippet>
> That is a good question-- this bit of code add invisible column references
> when a column reference in the group by list, is not in the projection 
> list;
> i.e.
> select c2, sum(c3) from test group by c1;
> will add c1 to the projection list as a generated column. I'm not sure if
> you want to do the same kind of rewrite for expressions. I thought this
> wasn't worth doing but I'm not sure anymore.

I just tried this query with and without your patch and it fails in both cases 
with error 42Y36/42Y30.  So I guess I'm a little confused as to why Derby would 
add "c1" to the projection list and what good it does us?

If you're not sure whether or not it's worth adding equivalent logic for 
expressions, do you plan to follow-up with this later?  If not, a comment 
explaining what you have found (or what your uncertainties are) might help other 
developers down the road (esp. if a bug comes up in this area later).

> Most if not all of the routines which are deleted were made redundant by
> this fix. For example, getColumnReference in GroupByColumn is not only
> unused it makes no sense with group by expressions. I think its good to get
> rid of this code but if you want, I'd be perfectly willing to add dead code
> back in the source tree.

I guess it wasn't clear to me how all of the deleted methods were made redundant 
by *this* fix.  In your comment when you posted the patch you just said "unused 
methods zapped", which sounded to me like you just happened to notice that the 
methods were unused and therefore deleted them.  That kind of spontaneous 
"cleanup" is, like the whitespace changes, something that should be 
posted/committed as a separate issue.  If, however, the methods are relevant and 
useful before your changes but redundant and deletable after your changes, then 
it's fine to have them in with this patch.

> I disagree with you here-- when a contributor has spent a considerable
> amount of time understanding and making a lot of changes to a class, it
> makes sense to remove dead code at the same time. If you don't, stuff like
> this will likely get ignored or slip through the cracks, imho.

I didn't mean to say that the changes were not useful nor that they shouldn't be 
committed; just that it might make it easier to track "cleanup" vs "functional" 
changes by filing a separate Jira issue to remove the dead code and posting an 
isolated patch to that issue.  The changes will still go in--no slipping through 
the cracks--but they will go in as part of their *own* Jira commit, which seems 
cleaner to me.

In any event, that was just a suggestion.  No need to let this prevent the 
commit of the patch.

> 7) There are a couple of places where you have commented lines out instead
> yes, you're right. I've removed it.

There are still a couple of these lingering:


  	public String getColumnName()
-		return colRef.getColumnName();
+		//return colRef.getColumnName();
+		return columnExpression.getColumnName();


+	protected boolean isEquivalent(ValueNode o) throws StandardException
+	{
+		if (!isSameNodeType(o)) {
+			return false;
+		}
+		ColumnReference other = (ColumnReference)o;
+		return (tableNumber == other.tableNumber
+				&& columnName.equals(other.getColumnName()));
+		//return source.isEquivalent(other.source);
+	}

>> A. Unnecessary white-space changes?
> Atleast some of these were done to improve existing indentation. Next time,
> I'll resist the urge to improve readability.

Thanks for cleaning a bunch of these whitespace diffs up in the version four 
patch.  That said, there are still quite a few white-space changes left--did you 
intentionally leave those in?  Ex.


@@ -470,10 +489,10 @@
  			** bottom project restrict.
  			newRC = (ResultColumn) getNodeFactory().getNode(
-										C_NodeTypes.RESULT_COLUMN,
-										"##aggregate result",
-										aggregate.getNewNullResultExpression(),
-										getContextManager());
+					C_NodeTypes.RESULT_COLUMN,
+					"##aggregate result",
+					aggregate.getNewNullResultExpression(),
+					getContextManager());

>> B. Lines longer than 80 chars:
> I try hard to to keep things within 80 chars with tab stop set to 4 chars.
> Do you see any changes where this is not the case?

Again, this is a fairly minor thing--you don't have to waste too much time on 
this.  When I look at the patch I still seem some lines that are longer than 80 
chars with a 4-char tab, but as you said this can be editor-dependent.  So long 
as you're aware of the issue and trying to respect it, I'm not going to complain.

Some other things to note:

1 -- SubstituteExpressionVisitor.java:

  * The copyright says 1998, 2004; I think this is supposed to just say 2006 
since that's when the file was created.

2 -- I see that in version 4 of the patch you updated UnaryOperatorNode to have 
the following logic in isEquivalent():

+    		UnaryOperatorNode other = (UnaryOperatorNode)o;
+    		return (operator.equals(other.operator) &&
+			((operand == other.operand)||
+			 ((operand != null) && operand.isEquivalent(other.operand))));

Did you become aware of this problem because of some test(s) that you were 
running?  If so, it'd be great if you could add the actual test case(s) to your 
JUnit test for regression purposes.  The more tests there are to check for these 
kinds of subtleties, the better...

And finally:

 > Manish Khettry updated DERBY-883:
 > ---------------------------------
 >     Attachment: 883.patch4.txt
 > I found a problem in one of my changes. Also got rid of a lot of
 > white space only diffs.

In the future it would be helpful if you could say what changed between versions 
of the patch, so that reviewers don't have to scour the patches to try to figure 
out what changed.  I managed to find out what "a problem" meant in this case, 
but it's generally a good idea to explicitly state the problem/changes in the 
comments.  Makes my life as a reviewer easier :)

Thanks for continuing to improve these patches,

View raw message