cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
Date Fri, 03 Oct 2014 21:50:37 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158569#comment-14158569
] 

Tyler Hobbs commented on CASSANDRA-4914:
----------------------------------------

I'm thinking a bit about making this compatible with with UDFs.  The problem with this approach
is that it relies on state that's not visible to the aggregation functions.

An alternative that would be (more easily) compatible with UDFs is a reduce-style aggregation.
 The reducer function takes two inputs: the current state and the next value.  You can optionally
provide an initial state and a finalizer function that is called with the final state after
reducing. UDTs, tuples, and collections should be sufficiently powerful to represent anything
that's needed for state.

In fact, Postgres's approach to user-defined aggregation functions is almost exactly this:
http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html.  I think we could slightly
simplify their approach by inferring the data types.

> Aggregation functions in CQL
> ----------------------------
>
>                 Key: CASSANDRA-4914
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Vijay
>            Assignee: Benjamin Lerer
>              Labels: cql, docs
>             Fix For: 3.0
>
>         Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt,
CASSANDRA-4914.txt
>
>
> The requirement is to do aggregation of data in Cassandra (Wide row of column values
of int, double, float etc).
> With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns
within a row).
> Example:
> SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC;                        
           
>  empid | deptid | first_name | last_name | salary
> -------+--------+------------+-----------+--------
>    130 |      3 |     joe    |     doe   |   10.1
>    130 |      2 |     joe    |     doe   |    100
>    130 |      1 |     joe    |     doe   |  1e+03
>  
> SELECT sum(salary), empid FROM emp WHERE empID IN (130);                            
       
>  sum(salary) | empid
> -------------+--------
>    1110.1    |  130



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message