crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allan Shoup <allan.sh...@gmail.com>
Subject Re: Reliably Parallelizing CPU-Intensive DoFns
Date Fri, 26 Sep 2014 04:03:06 GMT
I failed to mention that the I don't have an opportunity to read the source
- my input is a PTable of Avro keys and values.

On Thu, Sep 25, 2014 at 8:48 PM, Josh Wills <josh.wills@gmail.com> wrote:

> NLineSource, to control how many shards the small input is split up into?
>
> J
>
> On Thu, Sep 25, 2014 at 6:10 PM, Allan Shoup <allan.shoup@gmail.com>
> wrote:
>
>> I have a very cpu-intensive DoFn which running over a relatively small
>> input. Running on a Hadoop cluster, the job that it is run in sometimes
>> executes the function in map tasks and sometimes in reduce tasks. What's
>> the best way to reliably increase parallelization?
>>
>> One option may be to force a reduce step and control the number of
>> reducers. Are there any better options?
>>
>
>

Mime
View raw message