crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hrishikesh P <hrishi.engin...@gmail.com>
Subject Re: ParallelDo - DoFn in-order processing
Date Fri, 15 Nov 2013 21:13:09 GMT
that sounds good, I'll try that. thanks!


On Fri, Nov 15, 2013 at 10:45 AM, Josh Wills <josh.wills@gmail.com> wrote:

> One way, of course, is to do a group by key and force all of the records
> to a single reducer.
>
> Post-sort, I believe it's a safe assumption that the records will be
> processed by a DoFn in sorted order, although it's not necessarily the case
> that records with the same value of the key (if that ever happens in your
> data) will be processed in the same shard/DoFn.
>
> J
>
>
> On Fri, Nov 15, 2013 at 8:38 AM, Hrishikesh P <hrishi.engineer@gmail.com>wrote:
>
>> Hello -
>>
>>
>> In the parallelDo-DoFn processing, is it possible to ensure that the
>> records in the PTable will be processed in the given order? I have a PTable
>> of long and bytes (PTable<Long, ByteBuffer>) which is sorted by the long
>> value and I want to make sure that when the DoFn#process is called, the
>> records will be processed in the sorted order, as there may be a dependency
>> between the records.
>>
>>
>> I thought of a few options, like storing the sorted results to a text
>> file and using the file to process the records in the DoFn or using a table
>> to track the records being processed but wasn't sure if they would give
>> correct results and was wondering if there is a better approach.
>>
>>
>> Thanks.
>>
>
>

Mime
View raw message