cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Riccardo Ferrari <ferra...@gmail.com>
Subject Re: About UDF/UDA
Date Thu, 27 Sep 2018 04:55:18 GMT
Thank you Doan,

 Indeed I'm using a FINALFUNC to compute the average already.
Bit more context, I'm working on bucketized data, each bucket has already
an 'event count' and an 'average' in it.
My functions look as follow:

//SFUNC
CREATE OR REPLACE FUNCTION summaryState(state map<text, frozen<
tuple<bigint, bigint, bigint, bigint> >>, name text, avgloadtime int,
eventcount int)
    CALLED ON NULL INPUT
    RETURNS map<text, frozen< tuple<bigint, bigint, bigint, bigint> >>
    LANGUAGE java
    AS $$
        if (name != null) {
            com.datastax.driver.core.TupleValue stats =
(com.datastax.driver.core.TupleValue)state.get(name);

            if (stats == null) {
                com.datastax.driver.core.TupleType statsType =
com.datastax.driver.core.TupleType.of(com.datastax.driver.core.ProtocolVersion.NEWEST_SUPPORTED,
com.datastax.driver.core.CodecRegistry.DEFAULT_INSTANCE,
com.datastax.driver.core.DataType.bigint(),
com.datastax.driver.core.DataType.bigint(),com.datastax.driver.core.DataType.bigint(),com.datastax.driver.core.DataType.bigint());

                stats = statsType.newValue(Long.MAX_VALUE, 0L, 0L, 0L);
            }

            //Track min
            Long min_ = (Long) stats.getLong(0);
            min_ = min_ < avgloadtime ?  min_ : avgloadtime;
            stats.setLong(0, min_);

            //Track max
            Long max_ = (Long) stats.getLong(1);
            max_ = max_ > avgloadtime ?  max_ : avgloadtime;
            stats.setLong(1, max_);

            //Unroll average
            Long avgSum = (Long) stats.getLong(2);
            avgSum = avgSum + avgloadtime;
            stats.setLong(2, avgSum);

            //Event count
            Long sampleSum = (Long) stats.getLong(3);
            sampleSum = sampleSum + eventcount;
            stats.setLong(3, sampleSum);

            state.put(name, stats);
        }
        return state;
    $$;

//FINALFUNC
CREATE OR REPLACE FUNCTION summaryFinal (state map<text, frozen<
tuple<bigint, bigint, bigint, bigint> >>)
    CALLED ON NULL INPUT
    RETURNS map<text, frozen< tuple<bigint, bigint, bigint, bigint> >>
    LANGUAGE java
    AS $$
        for (Object name : state.keySet()) {
            com.datastax.driver.core.TupleValue stats =
(com.datastax.driver.core.TupleValue) state.get(name);

            Long avgSum = stats.getLong(2);
            Long sampleSum = stats.getLong(3);

            // Workaround: I can't escame the '/' using cql and had to use
Math.pow
            double avg_ = avgSum * Math.pow(sampleSum, -1);
            stats.setLong(2, new Double(avg_).longValue());

            state.put(name, stats);
        }
        return state;
    $$;
//AGGREGATE
CREATE OR REPLACE AGGREGATE summary(text, int, int)
    SFUNC summaryState
    STYPE map<text, frozen< tuple<bigint, bigint, bigint, bigint> >>
    FINALFUNC summaryFinal
    INITCOND {};

This gives me the following output:
<TABLE>.summary(event, averageloadtime, count)
------------------------------------------------------------------------------
 {'<item1>': (365, 870, 617, 2), '<item2>'': (381, 11668, 6024, 2)}

I would like to have something lke:
| item        | min          | max        | average     | count |
-----------------------------------------------------------------------
| <item1>  | 365         | 870         | 617            | 2        |
| <item2>  | 381         | 11668     | 6024          | 2        |

Do you know if that is possible?

On Wed, Sep 26, 2018 at 10:21 PM DuyHai Doan <doanduyhai@gmail.com> wrote:

> A hint to answer your Q3 is to use a final function to perform the
> flattening or transformation on the result of the aggregation
>
> The syntax of an UDA is:
>
> CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
> aggregateName(type1, type2, …)
> SFUNC accumulatorFunction
> STYPE stateType
> [FINALFUNC finalFunction]
> INITCOND initCond;
>
>
> The final return type will be the return type of the FINALFUNC and not
> necessarily the stateType
>
> More details by reading my blog post on it:
> http://www.doanduyhai.com/blog/?p=1876
>
> On Wed, Sep 26, 2018 at 3:58 PM, Riccardo Ferrari <ferrarir@gmail.com>
> wrote:
>
>> Hi users!
>>
>> Given my Cassandra version 3.0.x I don't have the famous GROUP BY
>> operator available. So looking around I turned to UDAs.
>>
>> I'm aware all/most of the magic happens on the coordinator and the plan
>> is to keep the data volume low to avoid too much pressure.
>>
>> Q1: How much is low volume. It's obvious the answer is depends but, has
>> anyone some experience to share?
>>
>> Q2: Do I undestand correctly is does not support pagination?
>>
>> I need something as simple as extract `min`, `max`, `average` and
>> `count` per group where I don't know the actual group - I can't fire a
>> query per each group name. - so something like `SELECT my_uda(field1,
>> field2) WHERE ...;`
>> This leads to:
>> - a function that tracks min, max and sum up count and average. The state
>> is a tuple
>> - a final function that computes the average.
>> - the aggregate function that uses the previous two
>> the result is something like
>> | 'item': (min_value, max_value, avg_value, count) , 'item2': (...),  ...|
>> Q3: Is there a way to `flatten` or `explode` the result into multiple
>> lines ?
>> If Q3 answer is yes: Is there a way to create multiple columns out of the
>> result:
>> ||other_fileds | item | min | max | avg | count||
>>
>> BONUS: Are there altenative? Should I really take into account upgrading
>> to 3.11.X ?
>> Thanks!
>>
>
>

Mime
View raw message