db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Satheesh Bandaram <banda...@gmail.com>
Subject Re: [jira] Commented: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.
Date Sat, 03 Jun 2006 18:23:23 GMT
Manish Khettry wrote:

> Here is what I think needs to be implemented
>
> First, an expression "matching" facility. The first cut can be
> restricted to looking at two expressions and detecting if they are the
> "same"-- by same, I mean, they will evaluate to the same value at
> runtime. Thus the expression "f()" is not the same as "f()". (I don't
> see a way in the derby docs to declare a function as "deterministic"--
> is that correct?)

Derby doesn't have a way to declare a function to be "deterministic", so
two different invocations of *f() *with same parameter values could
return different values. So your concern is a valid one about matching
functions. But can we address that through documentation, that if
functions are used in group by expressions and they don't return
deterministic values, the results could be un-predictable? We already
have this issue for ORDER BY expressions, which allows use of functions.
If Derby would allow creating function indexes later, we don't want to
restrict use of functions, I think. Preferrable option may be to add
DETERMINISTIC option to create function statement....

> Second, the group by binding/validation code will need to lose
> assumptions about grouping expressions being columns. We'll also have
> to do a little query rewrite (I'm guessing) to replace references to
> the grouping expressions in the having/select clause.

Sounds right....

>
> When we have this functionality ofcourse, as you suggest in your
> email, we'll look into allowing subexpressions and such.
>
> As far as the first step goes, I plan on adding a function to the base
> class (ValueNode).
>
> protected boolean isEquivalent(ValueNode other)
> {
>   return false;
> }
>
> and having each subclass override this method and ofcourse do the
> right thing.
>
> I'll have to spend time looking at the ValueNode hierarchy and see
> what the correct behavior for "isEquivalent" is going to be for each
> class.
>
> Just wanted to swing this proposal by people more familiar with the
> code. If you see anything wrong in this proposal, let me know. I'll
> probably start on it in a day or two.
>
Good progress. This sounds right.

Satheesh

> cheers
> Manish
>
> On 5/23/06, *Satheesh Bandaram* <bandaram@gmail.com
> <mailto:bandaram@gmail.com>> wrote:
>
>     Manish, good research. I think you are on the right track. Derby
>     already rewrites GROUP BY clause into select subquery so that
>     HAVING clause can be converted into a WHERE clause on top of
>     select subquery, with aggregate references converted into simple
>     columnReferences. Search for groupByList in sqlgrammar.jj. While
>     this rewrite makes implementation easier without any performance
>     penality, it does confuse a few query resolutions. See DERBY-280.
>     ( http://issues.apache.org/jira/browse/DERBY-280)
>
>     So, I think both the approaches below are essentially same thing,
>     because of current Derby rewrite. This rewrite could make
>     implementing this feature easier, like you said. Most of the work
>     seems to be in the compile phase.
>
>     To start with, you could implement simple expression matching,
>     which only finds exact matches:
>
>     Select day(ts), count(*) from timeStampTab group by day(ts) having
>     day(ts) = 1
>
>     Then it can be expanded to support proper sub-expressions.
>
>     Select (a+b)+c, count(*) from intTab group by a+b
>
>     Note that DB2 and Oracle don't seem to have fancy expression
>     matching, rejecting this:
>
>     Select b+a, count(*) from intTab group by a+b
>
>     so, Derby expression matching can be simple matching algorithm, to
>     start with. Good thing with this new feature is that it can be
>     expanded incrementally, without breaking existing or intermediate
>     stage queries.
>
>     Does that help?
>
>     Satheesh
>
>
>     On 5/20/06, *Manish Khettry * <manish.khettry@gmail.com
>     <mailto:manish.khettry@gmail.com>> wrote:
>
>         Thanks Alex and Satheesh, that does clarify things quite a bit.
>
>         I spent some time looking at the GroupedAggregateResultSet and
>         I have the feeling that atleast on the execution side of
>         things not much will have to change. A grouped aggregate RS
>         already sits on top of a ProjectRestrictRS and if we can have
>         the prRS evaluate the expressions we are grouping on, the
>         existing logic should work without any (or perhaps minimal)
>         changes. It is on the compilation side of things that this is,
>         err rather hairy :) Someone should correct me if I'm missing
>         something here or simplifying things.
>
>         The other approach is to rewrite the query to use a select
>         subquery like Satheesh suggested. Perhaps this is a better way
>         to go. Any thoughts on which approach is better?
>
>
>
>         On 5/18/06, *Alex Miller* <amiller@metamatrix.com
>         <mailto:amiller@metamatrix.com>> wrote:
>
>             No, we don't go quite that far.  Because most dbs support
>             group by expressions, we haven't needed to implement this. 
>
>             ------------------------------------------------------------------------
>             *From:* Satheesh Bandaram [mailto:bandaram@gmail.com
>             <mailto:bandaram@gmail.com>]
>             *Sent:* Thursday, May 18, 2006 2:16 PM
>
>             *To:* derby-dev@db.apache.org <mailto:derby-dev@db.apache.org>
>             *Subject:* Re: [jira] Commented: (DERBY-883) Enhance GROUP
>             BY clause to support expressions instead of just column
>             references.
>
>             Thanks Alex, for your message and the blogs... It was an
>             interesting reading. Wonder if MetaMatrix rewrites queries
>             with GROUP BY expressions for Derby into SELECT
>             subqueries... something like:
>
>             select year(hiredate), month(hiredate) as month, count(*) 
>             from employees
>             group by year(hiredate), month(hiredate);
>
>             INTO something like:
>
>             select years, months, count(*) from (select
>             year(hiredate), month(hiredate) from employees) emp
>             (years,months) group by years, months;
>
>             This could be used as a workaround until this
>             functionality is added to Derby.
>
>             Satheesh
>
>             On 5/18/06, *Alex Miller* < amiller@metamatrix.com
>             <mailto:amiller@metamatrix.com>> wrote:
>
>                 I blogged about GROUP BY expression support across dbs
>                 a while back
>                 (
>                 http://tech.puredanger.com/2005/03/02/fun-with-expressions-in-a-group-b
>                 y/ and
>                 http://tech.puredanger.com/2005/04/01/update-on-expressions-in-group-by/
>                 <http://tech.puredanger.com/2005/04/01/update-on-expressions-in-group-by/>
>                 ).  Pretty much all the major vendors support
>                 expressions in a GROUP BY.
>                 I implemented this functionality a while back for the
>                 MetaMatrix query
>                 engine and blogged about using MetaMatrix Query to
>                 "add" this
>                 functionality over Derby in
>                 http://devcentral.metamatrix.com/blog/alex/2006/02/28/Enhancing-Apache-D
>                 <http://devcentral.metamatrix.com/blog/alex/2006/02/28/Enhancing-Apache-D>
>                 erby-with-MetaMatrix.
>
>                 Positional parameters don't make as much sense for
>                 GROUP BY as they do
>                 for ORDER BY.  Logically, you're executing the clauses
>                 in the order
>                 FROM-WHERE-GROUP BY-HAVING-SELECT-ORDER BY.  The
>                 SELECT clause is the
>                 point at which the output columns of the query are
>                 effectively named and
>                 ordered.  Positional parameters make sense in ORDER BY
>                 because they
>                 refer to the output columns of the previous phase,
>                 which are defined as
>                 part of the query, but may have no well-defined name
>                 to refer to them
>                 with.
>
>                 In the case of GROUP BY, you would be referring
>                 backwards from the
>                 SELECT clause to the GROUP BY clause, so that seems
>                 kind of goofy.  The
>                 only reason to do this would be to avoid referencing a
>                 complex unnamed
>                 expression.  This is, of course, exactly the sort of
>                 thing that db
>                 vendors bend the rules about to make SQL more
>                 usable.  In fact, you can
>                 use GROUP BY positional parameters in MySQL and
>                 Postgres but not in any
>                 major commercial db that I've tried (Oracle, DB2, SQL
>                 Server, Sybase).
>
>                 Alex Miller
>                 Chief Architect
>                 MetaMatrix
>
>                 -----Original Message-----
>                 From: Satheesh Bandaram (JIRA)
>                 [mailto:derby-dev@db.apache.org
>                 <mailto:derby-dev@db.apache.org>]
>                 Sent: Wednesday, May 17, 2006 1:14 PM
>                 To: derby-dev@db.apache.org
>                 <mailto:derby-dev@db.apache.org>
>                 Subject: [jira] Commented: (DERBY-883) Enhance GROUP
>                 BY clause to
>                 support expressions instead of just column references.
>
>                     [
>                 http://issues.apache.org/jira/browse/DERBY-883?page=comments#action_1241
>                 <http://issues.apache.org/jira/browse/DERBY-883?page=comments#action_1241>
>                 2227 ]
>
>                 Satheesh Bandaram commented on DERBY-883:
>                 -----------------------------------------
>
>                 Thanks for making progress on this important new
>                 functionality, Manish.
>
>> 1. In terms of syntax, do we allow expressions in the
>                 group by list or
>                 positional parameters, or both?
>>
>> select tomonth(creationdt), toyear(creationdt),
>                 count(*) from bugs
>> group by 1, 2;
>
>                 I have seen positional parameters for ORDER BY
>                 expressions, not
>                 typically used in GROUP BY. Looking at both DB2 and Oracle
>                 documentation, it seems neither support positional
>                 parameters.
>
>> An implementation question on this note-- does the
>                 language code have
>> a way of looking at two expressions (ValueNode?) and
>                 checking to see
>> if they are equivalent? We'll need some way of doing
>                 this to match an
>> expression in the group by list to an expression in
>                 the select list
>                 right?
>
>                 Correct. Don't think there is any existing expression
>                 matching to
>                 compare two expressions. DB2 docs discuss how group by
>                 expressions are
>                 matched in SQL reference manual. (Page 484:
>                 ftp://ftp.software.ibm.com/ps/products/db2/info/vr82/pdf/en_US/db2s1e81
>                 <ftp://ftp.software.ibm.com/ps/products/db2/info/vr82/pdf/en_US/db2s1e81>.
>                 pdf)
>
>> 2. I assume that an expression in a group by list
>                 must appear in the
>> select list without aggregation right? Is this an error?
>>
>> select x+1, x+2, sum(y)
>> from test
>> group by x
>
>                 NO... This is a valid query. See the reference I
>                 provided above.
>
>> 3. What do we do with duplicates? i.e.
>>
>> select x+1, x+1, sum(y)
>> from test
>> group by x+1, x+1;
>>
>> Is this an error? The current implementation throws
>                 an error if the
>> same column occurs more than once in the group by list.
>
>                 I am not sure why Derby currently considers this an
>                 error... Looking at
>                 the code, it seems it may be looking for ambiguous
>                 column references
>                 (like 'x' being part of two different tables in
>                 from_list), which makes
>                 sense, but not sure why duplicate references should be
>                 prevented.
>
>> Is there a standard somewhere which I should consult
>                 before trying to
>                 nail down the functionality?
>
>                 Unfortunately, NO.... SQL 2003 seems to allow only
>                 column references in
>                 GROUP BY clause. But both DB2 and Oracle allow
>                 expressions in GROUP BY
>                 list and likely allowed by other database vendors too.
>                 You could use
>                 either DB2 or Oracle docs to understand how this
>                 functionality is
>                 defined there. Much easier to read these docs than
>                 confusing SQL 2003
>                 spec.
>
>
>> Enhance GROUP BY clause to support expressions
>                 instead of just column
>                 references.
>> ----------------------------------------------------------------------
>> -----------
>>
>>          Key: DERBY-883
>>          URL:
>                 http://issues.apache.org/jira/browse/DERBY-883
>>      Project: Derby
>>         Type: New Feature
>
>>   Components: SQL
>>     Versions: 10.1.2.1 <http://10.1.2.1>
>>  Environment: JDK 1.5.0_05
>>     Reporter: Lluis Turro
>>     Assignee: Manish Khettry
>
>>
>> This select would return an error syntax on finding
>                 "(" after month if
>                 group by clause:
>> select idissue, month(creation), year(creation),
>                 count(distinct
>> idissue) where
>>   ....
>> group by idissue, month(creation), year(creation)
>
>                 --
>                 This message is automatically generated by JIRA.
>                 -
>                 If you think it was sent incorrectly contact one of
>                 the administrators:
>                   
>                 http://issues.apache.org/jira/secure/Administrators.jspa
>                 -
>                 For more information on JIRA, see:
>                    http://www.atlassian.com/software/jira
>
>
>
>
>



Mime
View raw message