incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Beech <>
Subject Merging files using identity reduce
Date Tue, 05 Feb 2013 16:59:49 GMT
Hi all,

Something I find myself doing reasonably often in mapreduce is to use
the reduce step as nothing more than a means to merge data into larger
files. Unless I've missed something in the API, there doesn't appear
to be a neat way to do this with Crunch. Here's what I have now:

PGroupedTable<MyAvroRecord, Void> grouped =
  collection.parallelDo(new MapFn<MyAvroRecord, Pair<MyAvroRecord, Void>>() {
        public Pair<MyAvroRecord, Void> map(MyAvroRecord input) {
            return Pair.of(input, null);
    }, Avros.tableOf(Avros.specifics(MyAvroRecord.class),


Is there a better way? Or if not, maybe we could have a utility
function to do this or similar?


View raw message