crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Beech <d...@paraliatech.com>
Subject Re: Merging files using identity reduce
Date Tue, 05 Feb 2013 17:15:07 GMT
Thanks Josh. I'll open a JIRA

On 5 Feb 2013, at 17:13, Josh Wills <josh.wills@gmail.com> wrote:

> Sounds useful, no way to do it now, I think.
> 
> On Feb 5, 2013 12:00 PM, "Dave Beech" <dave@paraliatech.com> wrote:
>> Hi all,
>> 
>> Something I find myself doing reasonably often in mapreduce is to use
>> the reduce step as nothing more than a means to merge data into larger
>> files. Unless I've missed something in the API, there doesn't appear
>> to be a neat way to do this with Crunch. Here's what I have now:
>> 
>> PGroupedTable<MyAvroRecord, Void> grouped =
>>   collection.parallelDo(new MapFn<MyAvroRecord, Pair<MyAvroRecord, Void>>()
{
>>       @Override
>>         public Pair<MyAvroRecord, Void> map(MyAvroRecord input) {
>>             return Pair.of(input, null);
>>         }
>>     }, Avros.tableOf(Avros.specifics(MyAvroRecord.class),
>> Avros.nulls())).groupByKey(4);
>> 
>> pipeline.write(grouped,At.avroFile(MyAvroRecord.class));
>> 
>> Is there a better way? Or if not, maybe we could have a utility
>> function to do this or similar?
>> 
>> Thanks,
>> Dave

Mime
View raw message