crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Merging files using identity reduce
Date Tue, 05 Feb 2013 17:13:04 GMT
Sounds useful, no way to do it now, I think.
On Feb 5, 2013 12:00 PM, "Dave Beech" <dave@paraliatech.com> wrote:

> Hi all,
>
> Something I find myself doing reasonably often in mapreduce is to use
> the reduce step as nothing more than a means to merge data into larger
> files. Unless I've missed something in the API, there doesn't appear
> to be a neat way to do this with Crunch. Here's what I have now:
>
> PGroupedTable<MyAvroRecord, Void> grouped =
>   collection.parallelDo(new MapFn<MyAvroRecord, Pair<MyAvroRecord,
> Void>>() {
>       @Override
>         public Pair<MyAvroRecord, Void> map(MyAvroRecord input) {
>             return Pair.of(input, null);
>         }
>     }, Avros.tableOf(Avros.specifics(MyAvroRecord.class),
> Avros.nulls())).groupByKey(4);
>
> pipeline.write(grouped,At.avroFile(MyAvroRecord.class));
>
> Is there a better way? Or if not, maybe we could have a utility
> function to do this or similar?
>
> Thanks,
> Dave
>

Mime
View raw message