cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4914) Aggregate functions in CQL
Date Fri, 09 Nov 2012 09:58:14 GMT


Sylvain Lebresne commented on CASSANDRA-4914:

I'm not necessarily opposed to the idea on principle, but unless we have some fancy idea,
this will only save networks between the client and the coordinator, as internally we'll still
have to pull all the data (though that's not very different from what we do for count today).
Meaning that if we do that, we should be clear about that fact and that people should still
go the hadoop route to do large aggregations.

I'm also halfway convinced that it wouldn't be much harder to support custom "filter" functions.
I.e. to allow people to define some class having a method along the line of:
public ResultSet filter(ResultSet rs);
and so that it might be worth to go that more general route right away and just provide a
number of default aggregation functions.

I'm also not sure it's wise to support this until we can properly page CQL queries (i.e. I
think this should depends on CASSANDRA-4415). Also, I think it would be weird to introduce
aggregation before we remove our current select arbitrary limit (though I'm in favor of doing
that sooner than later: CASSANDRA-4918).

Lastly, aggregation might lose a bit of it's usefulness without a proper support for DISTINCT.
So overall my opinion would be: if we do do that, let's push that to 1.3 and do that correctly.
> Aggregate functions in CQL
> --------------------------
>                 Key: CASSANDRA-4914
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Vijay
>            Assignee: Vijay
>             Fix For: 1.2.1
> The requirement is to do aggregation of data in Cassandra (Wide row of column values
of int, double, float etc).
> With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns
within a row).
> Example:
> SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC;                        
>  empid | deptid | first_name | last_name | salary
> -------+--------+------------+-----------+--------
>    130 |      3 |     joe    |     doe   |   10.1
>    130 |      2 |     joe    |     doe   |    100
>    130 |      1 |     joe    |     doe   |  1e+03
> SELECT sum(salary), empid FROM emp WHERE empID IN (130);                            
>  sum(salary) | empid
> -------------+--------
>    1110.1    |  130

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message