cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10707) Add support for Group By to Select statement
Date Fri, 01 Jan 2016 21:36:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076378#comment-15076378
] 

Benjamin Lerer edited comment on CASSANDRA-10707 at 1/1/16 9:36 PM:
--------------------------------------------------------------------

Both will be supported.
What will not be supported is a {{group by}} clause where only a part of the partition key
will be specified. For example, if a table has a primary key like {{PRIMARY KEY((partitionKey1,
partitionKey2) clustering1, clustering2)}}, the following query will not be supported:
{{SELECT partitionKey1, MAX(value) FROM myTable GROUP BY partitionKey1}}

As for the aggregates, the grouping will be performed on the coordinator node. By consequence,
if the driver use the Token aware policy, a query containing a partition key predicate will
be more efficient as the aggregates will be built on the node where the data are located.

>From the syntax point of view, the queries:
{{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5 GROUP
BY partitionKey, clusteringColumn1;}}
and  {{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5
GROUP BY clusteringColumn1;}} will be both supported due to the fact that the {{partitionKey}}
column is restricted by an {{=}} operator.


was (Author: blerer):
Both will be supported.
What will not be supported is a {{group by}} clause were only a part of the partition key
will be specified. For example, if a table has a primary key like {{PRIMARY KEY((partitionKey1,
partitionKey2) clustering1, clustering2)}}, the following query will not be supported:
{{SELECT partitionKey1, MAX(value) FROM myTable GROUP BY partitionKey1}}

As for the aggregates, the grouping will be performed on the coordinator node. By consequence,
if the driver use the Token aware policy, a query containing a partition key predicate will
be more efficient as the aggregates will be built on the node where the data are located.

>From the syntax point of view, the queries:
{{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5 GROUP
BY partitionKey, clusteringColumn1;}}
and  {{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5
GROUP BY clusteringColumn1;}} will be both supported due to the fact that the {{partitionKey}}
column is restricted by an {{=}} operator.

> Add support for Group By to Select statement
> --------------------------------------------
>
>                 Key: CASSANDRA-10707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support {{GROUP BY}}
on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the clustering column
level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP BY partitionKey,
clustering0, clustering1; 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message