crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: crunch planner parameters
Date Mon, 12 Oct 2015 22:57:41 GMT
It is the latter approach, yes. The former would be better.

J

On Mon, Oct 12, 2015 at 3:56 PM, Everett Anderson <everett@nuna.com> wrote:

> Hey Josh,
>
> Somewhat related question -- when computing the number of reducers, is the
> planner doing that at the start of each MR job, estimating the size of the
> map output and then calculating number of reducers based on the input data
> size going into the job?
>
> Or does it make the calculation at the very beginning of the pipeline
> after reading the sources?
>
> The former might be more accurate, with the latter suffering a compounding
> effect from poor estimation at any step.
>
>
>
> On Mon, Oct 12, 2015 at 3:46 PM, Josh Wills <josh.wills@gmail.com> wrote:
>
>> No, just the number of tasks involved in each job. The structure should
>> remain the same.
>>
>> J
>>
>> On Mon, Oct 12, 2015 at 3:44 PM, Ravi Kolluri <ravi@nuna.com> wrote:
>>
>>>
>>> Thanks Josh!
>>>
>>> My question was more about how the planner organizes the map-reduce
>>> computation. Would the crunch job composition change based on input size?
>>>
>>> thanks,
>>> Ravi
>>>
>>>
>>> On Mon, Oct 12, 2015 at 3:38 PM, Josh Wills <josh.wills@gmail.com>
>>> wrote:
>>>
>>>> Hey Ravi,
>>>>
>>>> The number of reducers used in the various stages of the MR job can
>>>> change if you don't hard-code them using groupByKey(int numReducers) or
>>>> groupByKey(GroupingOptions) (or the equivalent settings via the
>>>> JoinStrategy classes for joins). The planner will try to estimate the
>>>> number of bytes to be processed and aims to process 1GB of data per
>>>> reducer. If you do hard-code the number of reduce tasks, the planner will
>>>> respect your wishes no matter what the input size is.
>>>>
>>>> Josh
>>>>
>>>> On Mon, Oct 12, 2015 at 2:31 PM, Ravi Kolluri <ravi@nuna.com> wrote:
>>>>
>>>>> Hello Crunch users,
>>>>>
>>>>> I have a question about what parameters go into the Crunch planner.
>>>>>
>>>>> Lets say I have a crunch job with a set of input tables, and a fixed
>>>>> set of calls to parallelDo and groupBy operations. Does the crunch
>>>>> execution plan stay fixed independent of the size distribution of the
>>>>> inputs?
>>>>>
>>>>> thanks,
>>>>> Ravi
>>>>>
>>>>>
>>>>> *DISCLAIMER:* The contents of this email, including any attachments,
>>>>> may contain information that is confidential, proprietary in nature,
>>>>> protected health information (PHI), or otherwise protected by law from
>>>>> disclosure, and is solely for the use of the intended recipient(s). If
you
>>>>> are not the intended recipient, you are hereby notified that any use,
>>>>> disclosure or copying of this email, including any attachments, is
>>>>> unauthorized and strictly prohibited. If you have received this email
in
>>>>> error, please notify the sender of this email. Please delete this and
all
>>>>> copies of this email from your system. Any opinions either expressed
or
>>>>> implied in this email and all attachments, are those of its author only,
>>>>> and do not necessarily reflect those of Nuna Health, Inc.
>>>>
>>>>
>>>>
>>>
>>> *DISCLAIMER:* The contents of this email, including any attachments,
>>> may contain information that is confidential, proprietary in nature,
>>> protected health information (PHI), or otherwise protected by law from
>>> disclosure, and is solely for the use of the intended recipient(s). If you
>>> are not the intended recipient, you are hereby notified that any use,
>>> disclosure or copying of this email, including any attachments, is
>>> unauthorized and strictly prohibited. If you have received this email in
>>> error, please notify the sender of this email. Please delete this and all
>>> copies of this email from your system. Any opinions either expressed or
>>> implied in this email and all attachments, are those of its author only,
>>> and do not necessarily reflect those of Nuna Health, Inc.
>>>
>>
>>
>
> *DISCLAIMER:* The contents of this email, including any attachments, may
> contain information that is confidential, proprietary in nature, protected
> health information (PHI), or otherwise protected by law from disclosure,
> and is solely for the use of the intended recipient(s). If you are not the
> intended recipient, you are hereby notified that any use, disclosure or
> copying of this email, including any attachments, is unauthorized and
> strictly prohibited. If you have received this email in error, please
> notify the sender of this email. Please delete this and all copies of this
> email from your system. Any opinions either expressed or implied in this
> email and all attachments, are those of its author only, and do not
> necessarily reflect those of Nuna Health, Inc.
>

Mime
View raw message