crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Beech <d...@paraliatech.com>
Subject Merging files using identity reduce
Date Tue, 05 Feb 2013 16:59:49 GMT
Hi all,

Something I find myself doing reasonably often in mapreduce is to use
the reduce step as nothing more than a means to merge data into larger
files. Unless I've missed something in the API, there doesn't appear
to be a neat way to do this with Crunch. Here's what I have now:

PGroupedTable<MyAvroRecord, Void> grouped =
  collection.parallelDo(new MapFn<MyAvroRecord, Pair<MyAvroRecord, Void>>() {
      @Override
        public Pair<MyAvroRecord, Void> map(MyAvroRecord input) {
            return Pair.of(input, null);
        }
    }, Avros.tableOf(Avros.specifics(MyAvroRecord.class),
Avros.nulls())).groupByKey(4);

pipeline.write(grouped,At.avroFile(MyAvroRecord.class));

Is there a better way? Or if not, maybe we could have a utility
function to do this or similar?

Thanks,
Dave

Mime
View raw message