hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Re: Assigning reduce tasks to specific nodes
Date Thu, 29 Nov 2012 04:56:38 GMT
Look at locality delay parameter 

Sent from my iPhone

On Nov 28, 2012, at 8:44 PM, Harsh J <harsh@cloudera.com> wrote:

> None of the current schedulers are "strict" in the sense of "do not
> schedule the task if such a tasktracker is not available". That has
> never been a requirement for Map/Reduce programs and nor should be.
> 
> I feel if you want some code to run individually on all nodes for
> whatever reason, you may as well ssh into each one and start it
> manually with appropriate host-based parameters, etc.. and then
> aggregate their results.
> 
> Note that even if you get down to writing a scheduler for this (which
> I don't think is a good idea, but anyway), you ought to make sure your
> scheduler also does non-strict scheduling of data local tasks for jobs
> that don't require such strictness - in order for them to complete
> quickly than wait around for scheduling in a fixed manner.
> 
> On Thu, Nov 29, 2012 at 6:00 AM, Hiroyuki Yamada <mogwaing@gmail.com> wrote:
>> Thank you all for the comments and advices.
>> 
>> I know it is not recommended to assigning mapper locations by myself.
>> But There needs to be one mapper running in each node in some cases,
>> so I need a strict way to do it.
>> 
>> So, locations is taken care of by JobTracker(scheduler), but it is not strict.
>> And, the only way to do it strictly is making a own scheduler, right ?
>> 
>> I have checked the source and I am not sure where to modify to do it.
>> What I understand is FairScheduler and others are for scheduling
>> multiple jobs. Is this right ?
>> What I want to do is scheduling tasks in one job.
>> This can be achieved by FairScheduler and others ?
>> 
>> Regards,
>> Hiroyuki
>> 
>> On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel
>> <michael_segel@hotmail.com> wrote:
>>> Mappers? Uhm... yes you can do it.
>>> Yes it is non-trivial.
>>> Yes, it is not recommended.
>>> 
>>> I think we talk a bit about this in an InfoQ article written by Boris
>>> Lublinsky.
>>> 
>>> Its kind of wild when your entire cluster map goes red in ganglia... :-)
>>> 
>>> 
>>> On Nov 28, 2012, at 2:41 AM, Harsh J <harsh@cloudera.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Mapper scheduling is indeed influenced by the getLocations() returned
>>> results of the InputSplit.
>>> 
>>> The map task itself does not care about deserializing the location
>>> information, as it is of no use to it. The location information is vital to
>>> the scheduler (or in 0.20.2, the JobTracker), where it is sent to directly
>>> when a job is submitted. The locations are used pretty well here.
>>> 
>>> You should be able to control (or rather, influence) mapper placement by
>>> working with the InputSplits, but not strictly so, cause in the end its up
>>> to your MR scheduler to do data local or non data local assignments.
>>> 
>>> 
>>> On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada <mogwaing@gmail.com>
>>> wrote:
>>>> 
>>>> Hi Harsh,
>>>> 
>>>> Thank you for the information.
>>>> I understand the current circumstances.
>>>> 
>>>> How about for mappers ?
>>>> As far as I tested, location information in InputSplit is ignored in
>>>> 0.20.2,
>>>> so there seems no easy way for assigning mappers to specific nodes.
>>>> (I before checked the source and noticed that
>>>> location information is not restored when deserializing the InputSplit
>>>> instance.)
>>>> 
>>>> Thanks,
>>>> Hiroyuki
>>>> 
>>>> On Wed, Nov 28, 2012 at 2:08 PM, Harsh J <harsh@cloudera.com> wrote:
>>>>> This is not supported/available currently even in MR2, but take a look
>>>>> at
>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-199.
>>>>> 
>>>>> 
>>>>> On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada <mogwaing@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I am wondering how I can assign reduce tasks to specific nodes.
>>>>>> What I want to do is, for example,  assigning reducer which produces
>>>>>> part-00000 to node xxx000,
>>>>>> and part-00001 to node xxx001 and so on.
>>>>>> 
>>>>>> I think it's abount task assignment scheduling but
>>>>>> I am not sure where to customize to achieve this.
>>>>>> Is this done by writing some extensions ?
>>>>>> or any easier way to do this ?
>>>>>> 
>>>>>> Regards,
>>>>>> Hiroyuki
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Harsh J
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
> 
> 
> 
> -- 
> Harsh J

Mime
View raw message