cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jon Haddad (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement
Date Thu, 25 Feb 2016 17:42:22 GMT


Jon Haddad commented on CASSANDRA-10707:

I don't think changing the order of ORDER BY and GROUP BY is self explanatory, so it doesn't
really offer any benefit, imo.  If I was trying out the feature I'd mostly be annoyed by it's
difference from something I've got muscle memory for.  

If you wanted to be technically accurate about it, SQL is declarative.  The order in which
you specify the clauses doesn't matter, it just happens to line up with how we mentally process
it.  If you chance the order of predicates in your WHERE clause it doesn't matter, you'll
still end up with the same query result.

Assuming I'm understanding the implementation correctly, what you're saying is that the query
behaves more the following:

select * from 
 ( select * from table order by some_field limit 100)
group by x,y,z

Is this correct, or am I missing something?  If it's the case, I hope this doesn't box us
in later down the line if we want to add support for other operations (like sub queries).
 If we're going to introduce more inconsistencies with SQL (which may be totally fair, I'm
just thinking out loud here), we would want to put the GROUP BY after the LIMIT, since it's
being applied then.  I'm not sure what this does to CQL in general, as now we've implicitly
made the decision to introduce clauses in an imperative fashion.  I'd rather not see new clauses
added piece by piece with different rules depending on the context, that definitely won't
make things any easier.

So my question is, is CQL a declarative language or not?  Will this ever be something we intend
to allow:

select username, score, state count(state) as c from top_scores where game_id=5 limit 1000
group by state order by c desc limit 5;

I don't think the above query works at all.  The aggregation is clearly a declarative clause.

Now, if the behavior of limit before aggregation is the right decision, that I might have
to argue with.

> Add support for Group By to Select statement
> --------------------------------------------
>                 Key: CASSANDRA-10707
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
> Now that Cassandra support aggregate functions, it makes sense to support {{GROUP BY}}
on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the clustering column
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP BY partitionKey,
clustering0, clustering1; 
> {code}

This message was sent by Atlassian JIRA

View raw message