hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Assigning reduce tasks to specific nodes
Date Sat, 01 Dec 2012 12:27:04 GMT
Yes, scheduling is done on a Tasktracker heartbeat basis, so it is
certainly possible to do absolutely strict scheduling (although be
aware of the condition of failing/unavailable tasktrackers).

Mohit's suggestion is somewhat like what you desire (delay scheduling
in fair scheduler config) - but setting it to very high values is bad
to do (for jobs that don't need this).

On Sat, Dec 1, 2012 at 4:11 PM, Hiroyuki Yamada <mogwaing@gmail.com> wrote:
> Thank you all for the comments.
>>you ought to make sure your scheduler also does non-strict scheduling of data local
tasks for jobs
> that don't require such strictness
> I just want to make sure one thing.
> If I write my own scheduler, is it possible to do "strict" scheduling ?
> Thanks
> On Thu, Nov 29, 2012 at 1:56 PM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
>> Look at locality delay parameter
>> Sent from my iPhone
>> On Nov 28, 2012, at 8:44 PM, Harsh J <harsh@cloudera.com> wrote:
>>> None of the current schedulers are "strict" in the sense of "do not
>>> schedule the task if such a tasktracker is not available". That has
>>> never been a requirement for Map/Reduce programs and nor should be.
>>> I feel if you want some code to run individually on all nodes for
>>> whatever reason, you may as well ssh into each one and start it
>>> manually with appropriate host-based parameters, etc.. and then
>>> aggregate their results.
>>> Note that even if you get down to writing a scheduler for this (which
>>> I don't think is a good idea, but anyway), you ought to make sure your
>>> scheduler also does non-strict scheduling of data local tasks for jobs
>>> that don't require such strictness - in order for them to complete
>>> quickly than wait around for scheduling in a fixed manner.
>>> On Thu, Nov 29, 2012 at 6:00 AM, Hiroyuki Yamada <mogwaing@gmail.com> wrote:
>>>> Thank you all for the comments and advices.
>>>> I know it is not recommended to assigning mapper locations by myself.
>>>> But There needs to be one mapper running in each node in some cases,
>>>> so I need a strict way to do it.
>>>> So, locations is taken care of by JobTracker(scheduler), but it is not strict.
>>>> And, the only way to do it strictly is making a own scheduler, right ?
>>>> I have checked the source and I am not sure where to modify to do it.
>>>> What I understand is FairScheduler and others are for scheduling
>>>> multiple jobs. Is this right ?
>>>> What I want to do is scheduling tasks in one job.
>>>> This can be achieved by FairScheduler and others ?
>>>> Regards,
>>>> Hiroyuki
>>>> On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel
>>>> <michael_segel@hotmail.com> wrote:
>>>>> Mappers? Uhm... yes you can do it.
>>>>> Yes it is non-trivial.
>>>>> Yes, it is not recommended.
>>>>> I think we talk a bit about this in an InfoQ article written by Boris
>>>>> Lublinsky.
>>>>> Its kind of wild when your entire cluster map goes red in ganglia...
>>>>> On Nov 28, 2012, at 2:41 AM, Harsh J <harsh@cloudera.com> wrote:
>>>>> Hi,
>>>>> Mapper scheduling is indeed influenced by the getLocations() returned
>>>>> results of the InputSplit.
>>>>> The map task itself does not care about deserializing the location
>>>>> information, as it is of no use to it. The location information is vital
>>>>> the scheduler (or in 0.20.2, the JobTracker), where it is sent to directly
>>>>> when a job is submitted. The locations are used pretty well here.
>>>>> You should be able to control (or rather, influence) mapper placement
>>>>> working with the InputSplits, but not strictly so, cause in the end its
>>>>> to your MR scheduler to do data local or non data local assignments.
>>>>> On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada <mogwaing@gmail.com>
>>>>> wrote:
>>>>>> Hi Harsh,
>>>>>> Thank you for the information.
>>>>>> I understand the current circumstances.
>>>>>> How about for mappers ?
>>>>>> As far as I tested, location information in InputSplit is ignored
>>>>>> 0.20.2,
>>>>>> so there seems no easy way for assigning mappers to specific nodes.
>>>>>> (I before checked the source and noticed that
>>>>>> location information is not restored when deserializing the InputSplit
>>>>>> instance.)
>>>>>> Thanks,
>>>>>> Hiroyuki
>>>>>> On Wed, Nov 28, 2012 at 2:08 PM, Harsh J <harsh@cloudera.com>
>>>>>>> This is not supported/available currently even in MR2, but take
a look
>>>>>>> at
>>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-199.
>>>>>>> On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada <mogwaing@gmail.com>
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>> I am wondering how I can assign reduce tasks to specific
>>>>>>>> What I want to do is, for example,  assigning reducer which
>>>>>>>> part-00000 to node xxx000,
>>>>>>>> and part-00001 to node xxx001 and so on.
>>>>>>>> I think it's abount task assignment scheduling but
>>>>>>>> I am not sure where to customize to achieve this.
>>>>>>>> Is this done by writing some extensions ?
>>>>>>>> or any easier way to do this ?
>>>>>>>> Regards,
>>>>>>>> Hiroyuki
>>>>>>> --
>>>>>>> Harsh J
>>>>> --
>>>>> Harsh J
>>> --
>>> Harsh J

Harsh J

View raw message