crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: crunch planner parameters
Date Mon, 12 Oct 2015 22:38:23 GMT
Hey Ravi,

The number of reducers used in the various stages of the MR job can change
if you don't hard-code them using groupByKey(int numReducers) or
groupByKey(GroupingOptions) (or the equivalent settings via the
JoinStrategy classes for joins). The planner will try to estimate the
number of bytes to be processed and aims to process 1GB of data per
reducer. If you do hard-code the number of reduce tasks, the planner will
respect your wishes no matter what the input size is.


On Mon, Oct 12, 2015 at 2:31 PM, Ravi Kolluri <> wrote:

> Hello Crunch users,
> I have a question about what parameters go into the Crunch planner.
> Lets say I have a crunch job with a set of input tables, and a fixed set
> of calls to parallelDo and groupBy operations. Does the crunch execution
> plan stay fixed independent of the size distribution of the inputs?
> thanks,
> Ravi
> *DISCLAIMER:* The contents of this email, including any attachments, may
> contain information that is confidential, proprietary in nature, protected
> health information (PHI), or otherwise protected by law from disclosure,
> and is solely for the use of the intended recipient(s). If you are not the
> intended recipient, you are hereby notified that any use, disclosure or
> copying of this email, including any attachments, is unauthorized and
> strictly prohibited. If you have received this email in error, please
> notify the sender of this email. Please delete this and all copies of this
> email from your system. Any opinions either expressed or implied in this
> email and all attachments, are those of its author only, and do not
> necessarily reflect those of Nuna Health, Inc.

View raw message