db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Miller" <amil...@metamatrix.com>
Subject RE: [jira] Commented: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.
Date Fri, 19 May 2006 04:16:46 GMT
No, we don't go quite that far.  Because most dbs support group by
expressions, we haven't needed to implement this.  

________________________________

From: Satheesh Bandaram [mailto:bandaram@gmail.com] 
Sent: Thursday, May 18, 2006 2:16 PM
To: derby-dev@db.apache.org
Subject: Re: [jira] Commented: (DERBY-883) Enhance GROUP BY clause to
support expressions instead of just column references.


Thanks Alex, for your message and the blogs... It was an interesting
reading. Wonder if MetaMatrix rewrites queries with GROUP BY expressions
for Derby into SELECT subqueries... something like:

select year(hiredate), month(hiredate) as month, count(*)  from
employees 
group by year(hiredate), month(hiredate);

INTO something like:

select years, months, count(*) from (select year(hiredate),
month(hiredate) from employees) emp (years,months) group by years,
months;

This could be used as a workaround until this functionality is added to
Derby.

Satheesh


On 5/18/06, Alex Miller < amiller@metamatrix.com
<mailto:amiller@metamatrix.com> > wrote: 

	I blogged about GROUP BY expression support across dbs a while
back 
	
(http://tech.puredanger.com/2005/03/02/fun-with-expressions-in-a-group-b
	y/ and
	
http://tech.puredanger.com/2005/04/01/update-on-expressions-in-group-by/
	).  Pretty much all the major vendors support expressions in a
GROUP BY.
	I implemented this functionality a while back for the MetaMatrix
query 
	engine and blogged about using MetaMatrix Query to "add" this
	functionality over Derby in
	
http://devcentral.metamatrix.com/blog/alex/2006/02/28/Enhancing-Apache-D

	erby-with-MetaMatrix.
	
	Positional parameters don't make as much sense for GROUP BY as
they do
	for ORDER BY.  Logically, you're executing the clauses in the
order
	FROM-WHERE-GROUP BY-HAVING-SELECT-ORDER BY.  The SELECT clause
is the 
	point at which the output columns of the query are effectively
named and
	ordered.  Positional parameters make sense in ORDER BY because
they
	refer to the output columns of the previous phase, which are
defined as 
	part of the query, but may have no well-defined name to refer to
them
	with.
	
	In the case of GROUP BY, you would be referring backwards from
the
	SELECT clause to the GROUP BY clause, so that seems kind of
goofy.  The 
	only reason to do this would be to avoid referencing a complex
unnamed
	expression.  This is, of course, exactly the sort of thing that
db
	vendors bend the rules about to make SQL more usable.  In fact,
you can
	use GROUP BY positional parameters in MySQL and Postgres but not
in any
	major commercial db that I've tried (Oracle, DB2, SQL Server,
Sybase).
	
	Alex Miller
	Chief Architect
	MetaMatrix
	
	-----Original Message----- 
	From: Satheesh Bandaram (JIRA) [mailto:derby-dev@db.apache.org]
	Sent: Wednesday, May 17, 2006 1:14 PM
	To: derby-dev@db.apache.org 
	Subject: [jira] Commented: (DERBY-883) Enhance GROUP BY clause
to
	support expressions instead of just column references.
	
	    [
	
http://issues.apache.org/jira/browse/DERBY-883?page=comments#action_1241
	2227 ]
	
	Satheesh Bandaram commented on DERBY-883:
	-----------------------------------------
	
	Thanks for making progress on this important new functionality,
Manish. 
	
	> 1. In terms of syntax, do we allow expressions in the group by
list or
	positional parameters, or both?
	>
	> select tomonth(creationdt), toyear(creationdt), count(*) from
bugs
	> group by 1, 2; 
	
	I have seen positional parameters for ORDER BY expressions, not
	typically used in GROUP BY. Looking at both DB2 and Oracle
	documentation, it seems neither support positional parameters.
	
	> An implementation question on this note-- does the language
code have 
	> a way of looking at two expressions (ValueNode?) and checking
to see
	> if they are equivalent? We'll need some way of doing this to
match an
	> expression in the group by list to an expression in the select
list 
	right?
	
	Correct. Don't think there is any existing expression matching
to
	compare two expressions. DB2 docs discuss how group by
expressions are
	matched in SQL reference manual. (Page 484:
	
ftp://ftp.software.ibm.com/ps/products/db2/info/vr82/pdf/en_US/db2s1e81.
	pdf)
	
	> 2. I assume that an expression in a group by list must appear
in the
	> select list without aggregation right? Is this an error? 
	>
	> select x+1, x+2, sum(y)
	> from test
	> group by x
	
	NO... This is a valid query. See the reference I provided above.
	
	> 3. What do we do with duplicates? i.e.
	>
	> select x+1, x+1, sum(y) 
	> from test
	> group by x+1, x+1;
	>
	> Is this an error? The current implementation throws an error
if the
	> same column occurs more than once in the group by list.
	
	I am not sure why Derby currently considers this an error...
Looking at 
	the code, it seems it may be looking for ambiguous column
references
	(like 'x' being part of two different tables in from_list),
which makes
	sense, but not sure why duplicate references should be
prevented.
	
	> Is there a standard somewhere which I should consult before
trying to
	nail down the functionality?
	
	Unfortunately, NO.... SQL 2003 seems to allow only column
references in
	GROUP BY clause. But both DB2 and Oracle allow expressions in
GROUP BY 
	list and likely allowed by other database vendors too. You could
use
	either DB2 or Oracle docs to understand how this functionality
is
	defined there. Much easier to read these docs than confusing SQL
2003
	spec. 
	
	
	> Enhance GROUP BY clause to support expressions instead of just
column
	references.
	>
----------------------------------------------------------------------
	> -----------
	>
	>          Key: DERBY-883 
	>          URL: http://issues.apache.org/jira/browse/DERBY-883
	>      Project: Derby
	>         Type: New Feature
	
	>   Components: SQL 
	>     Versions: 10.1.2.1
	>  Environment: JDK 1.5.0_05
	>     Reporter: Lluis Turro
	>     Assignee: Manish Khettry
	
	>
	> This select would return an error syntax on finding "(" after
month if 
	group by clause:
	> select idissue, month(creation), year(creation),
count(distinct
	> idissue) where
	>   ....
	> group by idissue, month(creation), year(creation)
	
	--
	This message is automatically generated by JIRA. 
	-
	If you think it was sent incorrectly contact one of the
administrators:
	   http://issues.apache.org/jira/secure/Administrators.jspa
	-
	For more information on JIRA, see:
	   http://www.atlassian.com/software/jira
	
	



Mime
View raw message