hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hiroyuki Yamada <mogwa...@gmail.com>
Subject Re: Assigning reduce tasks to specific nodes
Date Sat, 01 Dec 2012 10:41:16 GMT
Thank you all for the comments.

>you ought to make sure your scheduler also does non-strict scheduling of data local tasks
for jobs
that don't require such strictness

I just want to make sure one thing.
If I write my own scheduler, is it possible to do "strict" scheduling ?

Thanks

On Thu, Nov 29, 2012 at 1:56 PM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
> Look at locality delay parameter
>
> Sent from my iPhone
>
> On Nov 28, 2012, at 8:44 PM, Harsh J <harsh@cloudera.com> wrote:
>
>> None of the current schedulers are "strict" in the sense of "do not
>> schedule the task if such a tasktracker is not available". That has
>> never been a requirement for Map/Reduce programs and nor should be.
>>
>> I feel if you want some code to run individually on all nodes for
>> whatever reason, you may as well ssh into each one and start it
>> manually with appropriate host-based parameters, etc.. and then
>> aggregate their results.
>>
>> Note that even if you get down to writing a scheduler for this (which
>> I don't think is a good idea, but anyway), you ought to make sure your
>> scheduler also does non-strict scheduling of data local tasks for jobs
>> that don't require such strictness - in order for them to complete
>> quickly than wait around for scheduling in a fixed manner.
>>
>> On Thu, Nov 29, 2012 at 6:00 AM, Hiroyuki Yamada <mogwaing@gmail.com> wrote:
>>> Thank you all for the comments and advices.
>>>
>>> I know it is not recommended to assigning mapper locations by myself.
>>> But There needs to be one mapper running in each node in some cases,
>>> so I need a strict way to do it.
>>>
>>> So, locations is taken care of by JobTracker(scheduler), but it is not strict.
>>> And, the only way to do it strictly is making a own scheduler, right ?
>>>
>>> I have checked the source and I am not sure where to modify to do it.
>>> What I understand is FairScheduler and others are for scheduling
>>> multiple jobs. Is this right ?
>>> What I want to do is scheduling tasks in one job.
>>> This can be achieved by FairScheduler and others ?
>>>
>>> Regards,
>>> Hiroyuki
>>>
>>> On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel
>>> <michael_segel@hotmail.com> wrote:
>>>> Mappers? Uhm... yes you can do it.
>>>> Yes it is non-trivial.
>>>> Yes, it is not recommended.
>>>>
>>>> I think we talk a bit about this in an InfoQ article written by Boris
>>>> Lublinsky.
>>>>
>>>> Its kind of wild when your entire cluster map goes red in ganglia... :-)
>>>>
>>>>
>>>> On Nov 28, 2012, at 2:41 AM, Harsh J <harsh@cloudera.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Mapper scheduling is indeed influenced by the getLocations() returned
>>>> results of the InputSplit.
>>>>
>>>> The map task itself does not care about deserializing the location
>>>> information, as it is of no use to it. The location information is vital
to
>>>> the scheduler (or in 0.20.2, the JobTracker), where it is sent to directly
>>>> when a job is submitted. The locations are used pretty well here.
>>>>
>>>> You should be able to control (or rather, influence) mapper placement by
>>>> working with the InputSplits, but not strictly so, cause in the end its up
>>>> to your MR scheduler to do data local or non data local assignments.
>>>>
>>>>
>>>> On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada <mogwaing@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi Harsh,
>>>>>
>>>>> Thank you for the information.
>>>>> I understand the current circumstances.
>>>>>
>>>>> How about for mappers ?
>>>>> As far as I tested, location information in InputSplit is ignored in
>>>>> 0.20.2,
>>>>> so there seems no easy way for assigning mappers to specific nodes.
>>>>> (I before checked the source and noticed that
>>>>> location information is not restored when deserializing the InputSplit
>>>>> instance.)
>>>>>
>>>>> Thanks,
>>>>> Hiroyuki
>>>>>
>>>>> On Wed, Nov 28, 2012 at 2:08 PM, Harsh J <harsh@cloudera.com> wrote:
>>>>>> This is not supported/available currently even in MR2, but take a
look
>>>>>> at
>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-199.
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada <mogwaing@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am wondering how I can assign reduce tasks to specific nodes.
>>>>>>> What I want to do is, for example,  assigning reducer which produces
>>>>>>> part-00000 to node xxx000,
>>>>>>> and part-00001 to node xxx001 and so on.
>>>>>>>
>>>>>>> I think it's abount task assignment scheduling but
>>>>>>> I am not sure where to customize to achieve this.
>>>>>>> Is this done by writing some extensions ?
>>>>>>> or any easier way to do this ?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Hiroyuki
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>
>>
>>
>> --
>> Harsh J

Mime
View raw message