crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allan Shoup <>
Subject Reliably Parallelizing CPU-Intensive DoFns
Date Fri, 26 Sep 2014 01:10:48 GMT
I have a very cpu-intensive DoFn which running over a relatively small
input. Running on a Hadoop cluster, the job that it is run in sometimes
executes the function in map tasks and sometimes in reduce tasks. What's
the best way to reliably increase parallelization?

One option may be to force a reduce step and control the number of
reducers. Are there any better options?

View raw message