cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Stupp <sn...@snazy.de>
Subject Re: Cassandra: UDF
Date Wed, 05 Aug 2015 11:02:42 GMT
Suresh,

tip: you can use alternative (”pg-style”) string delimiters, which can span over multiple
lines and makes the CQL statement much nicer:
CREATE OR REPLACE FUNCTION state_groupbyandsum (
	state map<text, double>, datetime text, amount text )
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java 
AS $$
	String date = datetime.substring(0,10);
	Double count = (Double) state.get(date);
...
	return state;
$$ ;

UDAs are best suited for queries against a single partition - not against a possibly really
huge table.
This is nothing special for UDAs as you should always code your queries to hit a single partition.

User defined aggregates are not meant to do the job of (or even replace) an analytics framework
like Apache Spark.
Frankly, Top-K-queries over a big data set are best suited for Spark using the Cassandra-Spark-Connector.

In your case: imagine your query returns 1B rows - all that information must be held in the
map in the Java heap of the coordinator (the node that runs the UDA).

You can do Top-K query with UDAs over the whole table - and rely on the fact that rows passed
to the state function are grouped by their partition key (assuming that ‘datetime’ is
in your partition key) AND kicking datetime values out of your state-map that do not match
the Top-K criteria.
BUT: I do NOT recommend to do that upon user request - instead in a batch job and pipe the
result in another table for fast read access.

Robert


> On 05 Aug 2015, at 12:09, Suresh Mahawar <suresh.mahawar@technocube.in> wrote:
> 
> Hi,
> 
> I need your help. I have a query which get top 5 records group by date (not date + time)
and sum of amount.
> 
> I wrote the following but it returns all the records not just top 5 records
> 
> CREATE OR REPLACE FUNCTION state_groupbyandsum( state map<text, double>, datetime
text, amount text )
> CALLED ON NULL INPUT
> RETURNS map<text, double>
> LANGUAGE java 
> AS 'String date = datetime.substring(0,10); Double count = (Double) state.get(date);
 if (count == null) count = Double.parseDouble(amount); else count = count +  Double.parseDouble(amount);
state.put(date, count); return state;' ;
> 
> 
> CREATE OR REPLACE AGGREGATE groupbyandsum(text, text) 
> SFUNC state_groupbyandsum
> STYPE map<text, double>
> INITCOND {};
> 
> select groupbyandsum(datetime, amout) from warehouse;
> 
> Could you please help out to get just 5 records.
> 
> 
> Thanks & Regards,
> Suresh Mahawar
> TechnoCube
> Find Me on Linkedin <https://www.linkedin.com/pub/suresh-mahawar/2a/b9/a80>
—
Robert Stupp
@snazy


Mime
View raw message