hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hiroyuki Yamada <mogwa...@gmail.com>
Subject Re: Assigning reduce tasks to specific nodes
Date Thu, 29 Nov 2012 00:30:33 GMT
Thank you all for the comments and advices.

I know it is not recommended to assigning mapper locations by myself.
But There needs to be one mapper running in each node in some cases,
so I need a strict way to do it.

So, locations is taken care of by JobTracker(scheduler), but it is not strict.
And, the only way to do it strictly is making a own scheduler, right ?

I have checked the source and I am not sure where to modify to do it.
What I understand is FairScheduler and others are for scheduling
multiple jobs. Is this right ?
What I want to do is scheduling tasks in one job.
This can be achieved by FairScheduler and others ?

Regards,
Hiroyuki

On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel
<michael_segel@hotmail.com> wrote:
> Mappers? Uhm... yes you can do it.
> Yes it is non-trivial.
> Yes, it is not recommended.
>
> I think we talk a bit about this in an InfoQ article written by Boris
> Lublinsky.
>
> Its kind of wild when your entire cluster map goes red in ganglia... :-)
>
>
> On Nov 28, 2012, at 2:41 AM, Harsh J <harsh@cloudera.com> wrote:
>
> Hi,
>
> Mapper scheduling is indeed influenced by the getLocations() returned
> results of the InputSplit.
>
> The map task itself does not care about deserializing the location
> information, as it is of no use to it. The location information is vital to
> the scheduler (or in 0.20.2, the JobTracker), where it is sent to directly
> when a job is submitted. The locations are used pretty well here.
>
> You should be able to control (or rather, influence) mapper placement by
> working with the InputSplits, but not strictly so, cause in the end its up
> to your MR scheduler to do data local or non data local assignments.
>
>
> On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada <mogwaing@gmail.com>
> wrote:
>>
>> Hi Harsh,
>>
>> Thank you for the information.
>> I understand the current circumstances.
>>
>> How about for mappers ?
>> As far as I tested, location information in InputSplit is ignored in
>> 0.20.2,
>> so there seems no easy way for assigning mappers to specific nodes.
>> (I before checked the source and noticed that
>> location information is not restored when deserializing the InputSplit
>> instance.)
>>
>> Thanks,
>> Hiroyuki
>>
>> On Wed, Nov 28, 2012 at 2:08 PM, Harsh J <harsh@cloudera.com> wrote:
>> > This is not supported/available currently even in MR2, but take a look
>> > at
>> > https://issues.apache.org/jira/browse/MAPREDUCE-199.
>> >
>> >
>> > On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada <mogwaing@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am wondering how I can assign reduce tasks to specific nodes.
>> >> What I want to do is, for example,  assigning reducer which produces
>> >> part-00000 to node xxx000,
>> >> and part-00001 to node xxx001 and so on.
>> >>
>> >> I think it's abount task assignment scheduling but
>> >> I am not sure where to customize to achieve this.
>> >> Is this done by writing some extensions ?
>> >> or any easier way to do this ?
>> >>
>> >> Regards,
>> >> Hiroyuki
>> >
>> >
>> >
>> >
>> > --
>> > Harsh J
>
>
>
>
> --
> Harsh J
>
>

Mime
View raw message