cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9767) Allow the selection of columns together with aggregates
Date Thu, 09 Jul 2015 16:12:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620758#comment-14620758
] 

Ajay edited comment on CASSANDRA-9767 at 7/9/15 4:11 PM:
---------------------------------------------------------

I raised this bug majorily for allowing the selection of columns along with count (*) (other
aggregates as Cassandra supporting it from 2.2).Though it make sense with GROUP BY, I didn't
raise the bug to support GROUP BY as such issues usually are shot down immediately saying
use Spark for such cases. Now having supported cross nodes aggregations in 2.2 (thanks. I
was not knowing this before), it make sense (and should not be much difficult) to support
GROUP BY/HAVING or similar in CQL as well.


was (Author: ajaygarga):
I raised this bug majorily for allowing the selection of columns along with count (*) (other
aggregates as Cassandra supporting it from 2,2).Though it make sense with GROUP BY, I didn't
raise the bug to GROUP BY as such issues usually are shot down immediately saying use Spark
for such cases. Now having supported cross nodes aggregations in 2.2 (thanks. I was not knowing
this before), it make sense (and should not be much difficult) to support GROUP BY/HAVING
or similar in CQL as well.

> Allow the selection of columns together with aggregates
> -------------------------------------------------------
>
>                 Key: CASSANDRA-9767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9767
>             Project: Cassandra
>          Issue Type: Wish
>          Components: Core
>         Environment: Cassandra 2.0.16
> Ubuntu 15.04
>            Reporter: Ajay
>            Assignee: Benjamin Lerer
>            Priority: Minor
>
> Lets assume we have a column family as below:
> create table sample ( track_id int, user_id int, country varchar, primary key ((track_id),
user_id));
> where track_id is the partition key.
> Now to aggregate the number of rows for a single track_id, we can query using CQL as
below:
> select count(*) where track_id = 1 and user_id = 1;
> But that will return only the count. If we need the other columns along with the count,
we cannot query as below as it throws error:
>  select count(*), country  from sample where track_id = 1 and user_id = 1;
> Bad Request: line 1:15 mismatched input ',' expecting K_FROM.
> In this case, all rows for a given track_id and user_id will have the same value for
country. So we should be able to query as above.  Also in SQL, it is possible to select columns
along with aggregate functions.
> Though I know that Cassandra is not analytics (unlike Hadoop and Spark), we need some
basic aggregate functions like min, max, avg etc....Though performance wise it might not be
efficient, but it is better done in the cassandra side (as it uses native protocol) than we
getting all rows in the client and doing the basic aggregation.  It cannot used just as a
data store (as garbage-in garbage-out). In that context, currently CQL is pretty limited.
Just for getting data out of cassandra, we will have to spark though we will not be doing
much analytics on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message