hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Assigning reduce tasks to specific nodes
Date Thu, 29 Nov 2012 04:44:37 GMT
None of the current schedulers are "strict" in the sense of "do not
schedule the task if such a tasktracker is not available". That has
never been a requirement for Map/Reduce programs and nor should be.

I feel if you want some code to run individually on all nodes for
whatever reason, you may as well ssh into each one and start it
manually with appropriate host-based parameters, etc.. and then
aggregate their results.

Note that even if you get down to writing a scheduler for this (which
I don't think is a good idea, but anyway), you ought to make sure your
scheduler also does non-strict scheduling of data local tasks for jobs
that don't require such strictness - in order for them to complete
quickly than wait around for scheduling in a fixed manner.

On Thu, Nov 29, 2012 at 6:00 AM, Hiroyuki Yamada <mogwaing@gmail.com> wrote:
> Thank you all for the comments and advices.
>
> I know it is not recommended to assigning mapper locations by myself.
> But There needs to be one mapper running in each node in some cases,
> so I need a strict way to do it.
>
> So, locations is taken care of by JobTracker(scheduler), but it is not strict.
> And, the only way to do it strictly is making a own scheduler, right ?
>
> I have checked the source and I am not sure where to modify to do it.
> What I understand is FairScheduler and others are for scheduling
> multiple jobs. Is this right ?
> What I want to do is scheduling tasks in one job.
> This can be achieved by FairScheduler and others ?
>
> Regards,
> Hiroyuki
>
> On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel
> <michael_segel@hotmail.com> wrote:
>> Mappers? Uhm... yes you can do it.
>> Yes it is non-trivial.
>> Yes, it is not recommended.
>>
>> I think we talk a bit about this in an InfoQ article written by Boris
>> Lublinsky.
>>
>> Its kind of wild when your entire cluster map goes red in ganglia... :-)
>>
>>
>> On Nov 28, 2012, at 2:41 AM, Harsh J <harsh@cloudera.com> wrote:
>>
>> Hi,
>>
>> Mapper scheduling is indeed influenced by the getLocations() returned
>> results of the InputSplit.
>>
>> The map task itself does not care about deserializing the location
>> information, as it is of no use to it. The location information is vital to
>> the scheduler (or in 0.20.2, the JobTracker), where it is sent to directly
>> when a job is submitted. The locations are used pretty well here.
>>
>> You should be able to control (or rather, influence) mapper placement by
>> working with the InputSplits, but not strictly so, cause in the end its up
>> to your MR scheduler to do data local or non data local assignments.
>>
>>
>> On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada <mogwaing@gmail.com>
>> wrote:
>>>
>>> Hi Harsh,
>>>
>>> Thank you for the information.
>>> I understand the current circumstances.
>>>
>>> How about for mappers ?
>>> As far as I tested, location information in InputSplit is ignored in
>>> 0.20.2,
>>> so there seems no easy way for assigning mappers to specific nodes.
>>> (I before checked the source and noticed that
>>> location information is not restored when deserializing the InputSplit
>>> instance.)
>>>
>>> Thanks,
>>> Hiroyuki
>>>
>>> On Wed, Nov 28, 2012 at 2:08 PM, Harsh J <harsh@cloudera.com> wrote:
>>> > This is not supported/available currently even in MR2, but take a look
>>> > at
>>> > https://issues.apache.org/jira/browse/MAPREDUCE-199.
>>> >
>>> >
>>> > On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada <mogwaing@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I am wondering how I can assign reduce tasks to specific nodes.
>>> >> What I want to do is, for example,  assigning reducer which produces
>>> >> part-00000 to node xxx000,
>>> >> and part-00001 to node xxx001 and so on.
>>> >>
>>> >> I think it's abount task assignment scheduling but
>>> >> I am not sure where to customize to achieve this.
>>> >> Is this done by writing some extensions ?
>>> >> or any easier way to do this ?
>>> >>
>>> >> Regards,
>>> >> Hiroyuki
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Harsh J
>>
>>
>>
>>
>> --
>> Harsh J
>>
>>



-- 
Harsh J

Mime
View raw message