cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement
Date Wed, 20 Jan 2016 09:38:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108289#comment-15108289
] 

Benjamin Lerer commented on CASSANDRA-10707:
--------------------------------------------

Taking into account the fact that we allow today queries like: {{SELECT max((*)), min((*)),
count((*)) FROM myTable;}}, I do not really think that we can provide support for {{GROUP
BY}} queries without allowing to group by partition keys.
Some users might be interested by queries like: {{SELECT max((*)), min((*)) count((*)) FROM
myTable GROUP BY partitionKey;}} or {{SELECT max((*)), min((*)) count((*)) FROM myTable WHERE
partitionKey IN (1, 2, 3) GROUP BY partitionKey;}}

Now, it is clear that those queries are not recommended and that the timeouts will probably
need to be adjusted. As for the current aggregates queries a warning will be logged to warn
the users if the partition key is not restricted by an equality.

The problem is not really the work needed to compute the aggregates. It is just the fact that
the data has to be retrieved from other nodes.

In the future, we might manage to push the aggregate computation to the replicas but we are
not there yet. 

> Add support for Group By to Select statement
> --------------------------------------------
>
>                 Key: CASSANDRA-10707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support {{GROUP BY}}
on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the clustering column
level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP BY partitionKey,
clustering0, clustering1; 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message