db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Satheesh Bandaram" <banda...@gmail.com>
Subject Re: [jira] Commented: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.
Date Wed, 24 May 2006 00:33:00 GMT
Oops... Query Select (a+b)+c, count(*) from intTab group by a+b

should be like:

Select (a+b)+1, count(*) from intTab group by a+b

Since 'c' is not part of the group by list, it can not be referenced in
select clause.

Satheesh

On 5/23/06, Satheesh Bandaram <bandaram@gmail.com> wrote:
>
> Manish, good research. I think you are on the right track. Derby already
> rewrites GROUP BY clause into select subquery so that HAVING clause can be
> converted into a WHERE clause on top of select subquery, with aggregate
> references converted into simple columnReferences. Search for groupByList in
> sqlgrammar.jj. While this rewrite makes implementation easier without any
> performance penality, it does confuse a few query resolutions. See
> DERBY-280. ( http://issues.apache.org/jira/browse/DERBY-280)
>
> So, I think both the approaches below are essentially same thing, because
> of current Derby rewrite. This rewrite could make implementing this feature
> easier, like you said. Most of the work seems to be in the compile phase.
>
> To start with, you could implement simple expression matching, which only
> finds exact matches:
>
> Select day(ts), count(*) from timeStampTab group by day(ts) having day(ts)
> = 1
>
> Then it can be expanded to support proper sub-expressions.
>
> Select (a+b)+c, count(*) from intTab group by a+b
>
> Note that DB2 and Oracle don't seem to have fancy expression matching,
> rejecting this:
>
> Select b+a, count(*) from intTab group by a+b
>
> so, Derby expression matching can be simple matching algorithm, to start
> with. Good thing with this new feature is that it can be expanded
> incrementally, without breaking existing or intermediate stage queries.
>
> Does that help?
>
> Satheesh
>
>
> On 5/20/06, Manish Khettry <manish.khettry@gmail.com> wrote:
> >
> > Thanks Alex and Satheesh, that does clarify things quite a bit.
> >
> > I spent some time looking at the GroupedAggregateResultSet and I have
> > the feeling that atleast on the execution side of things not much will have
> > to change. A grouped aggregate RS already sits on top of a ProjectRestrictRS
> > and if we can have the prRS evaluate the expressions we are grouping on, the
> > existing logic should work without any (or perhaps minimal) changes. It is
> > on the compilation side of things that this is, err rather hairy :) Someone
> > should correct me if I'm missing something here or simplifying things.
> >
> > The other approach is to rewrite the query to use a select subquery like
> > Satheesh suggested. Perhaps this is a better way to go. Any thoughts on
> > which approach is better?
> >
> >
> >
> > On 5/18/06, Alex Miller <amiller@metamatrix.com> wrote:
> > >
> > >  No, we don't go quite that far.  Because most dbs support group by
> > > expressions, we haven't needed to implement this.
> > >
> > >  ------------------------------
> > > *From:* Satheesh Bandaram [mailto:bandaram@gmail.com]
> > > *Sent:* Thursday, May 18, 2006 2:16 PM
> > >
> > > *To:* derby-dev@db.apache.org
> > > *Subject:* Re: [jira] Commented: (DERBY-883) Enhance GROUP BY clause
> > > to support expressions instead of just column references.
> > >
> > > Thanks Alex, for your message and the blogs... It was an interesting
> > > reading. Wonder if MetaMatrix rewrites queries with GROUP BY expressions for
> > > Derby into SELECT subqueries... something like:
> > >
> > > select year(hiredate), month(hiredate) as month, count(*)  from
> > > employees
> > > group by year(hiredate), month(hiredate);
> > >
> > > INTO something like:
> > >
> > > select years, months, count(*) from (select year(hiredate),
> > > month(hiredate) from employees) emp (years,months) group by years, months;
> > >
> > > This could be used as a workaround until this functionality is added
> > > to Derby.
> > >
> > > Satheesh
> > >
> > > On 5/18/06, Alex Miller < amiller@metamatrix.com> wrote:
> > > >
> > > > I blogged about GROUP BY expression support across dbs a while back
> > > > (http://tech.puredanger.com/2005/03/02/fun-with-expressions-in-a-group-b
> > > > y/ and
> > > > http://tech.puredanger.com/2005/04/01/update-on-expressions-in-group-by/
> > > >
> > > > ).  Pretty much all the major vendors support expressions in a GROUP
> > > > BY.
> > > > I implemented this functionality a while back for the MetaMatrix
> > > > query
> > > > engine and blogged about using MetaMatrix Query to "add" this
> > > > functionality over Derby in
> > > > http://devcentral.metamatrix.com/blog/alex/2006/02/28/Enhancing-Apache-D
> > > >
> > > > erby-with-MetaMatrix.
> > > >
> > > > Positional parameters don't make as much sense for GROUP BY as they
> > > > do
> > > > for ORDER BY.  Logically, you're executing the clauses in the order
> > > > FROM-WHERE-GROUP BY-HAVING-SELECT-ORDER BY.  The SELECT clause is
> > > > the
> > > > point at which the output columns of the query are effectively named
> > > > and
> > > > ordered.  Positional parameters make sense in ORDER BY because they
> > > > refer to the output columns of the previous phase, which are defined
> > > > as
> > > > part of the query, but may have no well-defined name to refer to
> > > > them
> > > > with.
> > > >
> > > > In the case of GROUP BY, you would be referring backwards from the
> > > > SELECT clause to the GROUP BY clause, so that seems kind of
> > > > goofy.  The
> > > > only reason to do this would be to avoid referencing a complex
> > > > unnamed
> > > > expression.  This is, of course, exactly the sort of thing that db
> > > > vendors bend the rules about to make SQL more usable.  In fact, you
> > > > can
> > > > use GROUP BY positional parameters in MySQL and Postgres but not in
> > > > any
> > > > major commercial db that I've tried (Oracle, DB2, SQL Server,
> > > > Sybase).
> > > >
> > > > Alex Miller
> > > > Chief Architect
> > > > MetaMatrix
> > > >
> > > > -----Original Message-----
> > > > From: Satheesh Bandaram (JIRA) [mailto:derby-dev@db.apache.org]
> > > > Sent: Wednesday, May 17, 2006 1:14 PM
> > > > To: derby-dev@db.apache.org
> > > > Subject: [jira] Commented: (DERBY-883) Enhance GROUP BY clause to
> > > > support expressions instead of just column references.
> > > >
> > > >     [
> > > > http://issues.apache.org/jira/browse/DERBY-883?page=comments#action_1241
> > > >
> > > > 2227 ]
> > > >
> > > > Satheesh Bandaram commented on DERBY-883:
> > > > -----------------------------------------
> > > >
> > > > Thanks for making progress on this important new functionality,
> > > > Manish.
> > > >
> > > > > 1. In terms of syntax, do we allow expressions in the group by
> > > > list or
> > > > positional parameters, or both?
> > > > >
> > > > > select tomonth(creationdt), toyear(creationdt), count(*) from bugs
> > > > > group by 1, 2;
> > > >
> > > > I have seen positional parameters for ORDER BY expressions, not
> > > > typically used in GROUP BY. Looking at both DB2 and Oracle
> > > > documentation, it seems neither support positional parameters.
> > > >
> > > > > An implementation question on this note-- does the language code
> > > > have
> > > > > a way of looking at two expressions (ValueNode?) and checking to
> > > > see
> > > > > if they are equivalent? We'll need some way of doing this to match
> > > > an
> > > > > expression in the group by list to an expression in the select
> > > > list
> > > > right?
> > > >
> > > > Correct. Don't think there is any existing expression matching to
> > > > compare two expressions. DB2 docs discuss how group by expressions
> > > > are
> > > > matched in SQL reference manual. (Page 484:
> > > > ftp://ftp.software.ibm.com/ps/products/db2/info/vr82/pdf/en_US/db2s1e81
> > > > .
> > > > pdf)
> > > >
> > > > > 2. I assume that an expression in a group by list must appear in
> > > > the
> > > > > select list without aggregation right? Is this an error?
> > > > >
> > > > > select x+1, x+2, sum(y)
> > > > > from test
> > > > > group by x
> > > >
> > > > NO... This is a valid query. See the reference I provided above.
> > > >
> > > > > 3. What do we do with duplicates? i.e.
> > > > >
> > > > > select x+1, x+1, sum(y)
> > > > > from test
> > > > > group by x+1, x+1;
> > > > >
> > > > > Is this an error? The current implementation throws an error if
> > > > the
> > > > > same column occurs more than once in the group by list.
> > > >
> > > > I am not sure why Derby currently considers this an error... Looking
> > > > at
> > > > the code, it seems it may be looking for ambiguous column references
> > > > (like 'x' being part of two different tables in from_list), which
> > > > makes
> > > > sense, but not sure why duplicate references should be prevented.
> > > >
> > > > > Is there a standard somewhere which I should consult before trying
> > > > to
> > > > nail down the functionality?
> > > >
> > > > Unfortunately, NO.... SQL 2003 seems to allow only column references
> > > > in
> > > > GROUP BY clause. But both DB2 and Oracle allow expressions in GROUP
> > > > BY
> > > > list and likely allowed by other database vendors too. You could use
> > > > either DB2 or Oracle docs to understand how this functionality is
> > > > defined there. Much easier to read these docs than confusing SQL
> > > > 2003
> > > > spec.
> > > >
> > > >
> > > > > Enhance GROUP BY clause to support expressions instead of just
> > > > column
> > > > references.
> > > > >
> > > > ----------------------------------------------------------------------
> > > > > -----------
> > > > >
> > > > >          Key: DERBY-883
> > > > >          URL: http://issues.apache.org/jira/browse/DERBY-883
> > > > >      Project: Derby
> > > > >         Type: New Feature
> > > >
> > > > >   Components: SQL
> > > > >     Versions: 10.1.2.1
> > > > >  Environment: JDK 1.5.0_05
> > > > >     Reporter: Lluis Turro
> > > > >     Assignee: Manish Khettry
> > > >
> > > > >
> > > > > This select would return an error syntax on finding "(" after
> > > > month if
> > > > group by clause:
> > > > > select idissue, month(creation), year(creation), count(distinct
> > > > > idissue) where
> > > > >   ....
> > > > > group by idissue, month(creation), year(creation)
> > > >
> > > > --
> > > > This message is automatically generated by JIRA.
> > > > -
> > > > If you think it was sent incorrectly contact one of the
> > > > administrators:
> > > >    http://issues.apache.org/jira/secure/Administrators.jspa
> > > > -
> > > > For more information on JIRA, see:
> > > >    http://www.atlassian.com/software/jira
> > > >
> > > >
> > >
> >
>

Mime
View raw message