crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Ortiz <>
Subject Re: secondary sort in crunch on spark.
Date Tue, 23 Jun 2015 13:57:30 GMT
Correct me if I'm wrong, but if you are using an avro record or a Tuple
data structure, couldn't you get a secondary sort by just sticking the
fields in the order you want to apply the sort, and then using the regular
sort api?  For example, if I had say, itemid, itemprice, nosold and I
wanted to do something like....

select itemid, itemprice, sum(nosold) from table group by itemid,
itemprice, order by itemid, itemprice asc;

I could implement that as...
PTable<Pair<Integer, Double>, Long> items = {...some code to load the data
into this
structure...}.groupByKey().combineValues(Aggregators.SUM_LONGS).sort() and
get something similar right?

On Tue, Jun 23, 2015 at 8:52 AM Kidong Lee <> wrote:

> Hi,
> I have been using spark to implement our recommendation algorithm, for
> which it was hard to get secondary sort by value, thus, I have implemented
> this algorithm with the help of hive.
> I think, spark does not support secondary sort yet.
> I have recently implemented the same recommendation algorithm in crunch
> running on spark with using crunch secondary sort API.
> I am wondering how to implement secondary sort in crunch running on spark.
> Anybody can give me some explanations about the implementation of
> secondary sort in crunch spark?
> thanks,
> - Kidong.

View raw message