cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement
Date Mon, 21 Dec 2015 11:16:47 GMT


Benjamin Lerer commented on CASSANDRA-10707:

The main difficulty of the ticket is the paging. Between the client and the coordinator nodes
the page are returned based on the grouping but internally the data are paged by number of
For example, if a {{Group by}} query is used with a page size of 5000, the first page returned
to the client must contains the aggregates for the first 5000 groups or less (if there was
less than 5000 groups). As these groups can be composed of a big number of rows, in order
to avoid  OOM errors, the coordinator node need to request pages of data from the other nodes
until it has enough groups. One of the problem being that it is only possible to be sure that
a group is complete when the next group is reached or the data exhausted.

> Add support for Group By to Select statement
> --------------------------------------------
>                 Key: CASSANDRA-10707
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
> Now that Cassandra support aggregate functions, it makes sense to support {{GROUP BY}}
on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the clustering column
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP BY partitionKey,
clustering0, clustering1; 
> {code}

This message was sent by Atlassian JIRA

View raw message