crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Reliably Parallelizing CPU-Intensive DoFns
Date Fri, 26 Sep 2014 01:48:37 GMT
NLineSource, to control how many shards the small input is split up into?

J

On Thu, Sep 25, 2014 at 6:10 PM, Allan Shoup <allan.shoup@gmail.com> wrote:

> I have a very cpu-intensive DoFn which running over a relatively small
> input. Running on a Hadoop cluster, the job that it is run in sometimes
> executes the function in map tasks and sometimes in reduce tasks. What's
> the best way to reliably increase parallelization?
>
> One option may be to force a reduce step and control the number of
> reducers. Are there any better options?
>

Mime
View raw message