crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: Reliably Parallelizing CPU-Intensive DoFns
Date Fri, 26 Sep 2014 01:48:37 GMT
NLineSource, to control how many shards the small input is split up into?


On Thu, Sep 25, 2014 at 6:10 PM, Allan Shoup <> wrote:

> I have a very cpu-intensive DoFn which running over a relatively small
> input. Running on a Hadoop cluster, the job that it is run in sometimes
> executes the function in map tasks and sometimes in reduce tasks. What's
> the best way to reliably increase parallelization?
> One option may be to force a reduce step and control the number of
> reducers. Are there any better options?

View raw message