Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of mogwaing@gmail.com designates
 209.85.220.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <BLU0-SMTP10608F721F1A84DA12869318F5D0@phx.gbl>
References: 
 <CAPDOW74WdbrJAj7mfOR12K0R3UtpO1CHPrWUyWKbj+J9cvXCng@mail.gmail.com>
	<CAOcnVr01P0UP=vKOa0nV1s1WY6J771iEs9eM6DeziDtrbVxVFQ@mail.gmail.com>
	<CAPDOW75DVKf+pHZeT1Lw0xvHPHDNBorxkNb+eakk97AOSWscCg@mail.gmail.com>
	<CAOcnVr3NQ-QtsJwFgWbhkv5=6AxkRiEXGc=4mVid+4XNR_z6tQ@mail.gmail.com>
	<BLU0-SMTP10608F721F1A84DA12869318F5D0@phx.gbl>
Date: Thu, 29 Nov 2012 09:30:33 +0900
Message-ID: 
 <CAPDOW76J4=b_GefvOu4bHO2u-c8hBPcccxvRHiEFtRuoibCxPA@mail.gmail.com>
Subject: Re: Assigning reduce tasks to specific nodes
From: Hiroyuki Yamada <mogwaing@gmail.com>
To: user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Thank you all for the comments and advices.

I know it is not recommended to assigning mapper locations by myself.
But There needs to be one mapper running in each node in some cases,
so I need a strict way to do it.

So, locations is taken care of by JobTracker(scheduler), but it is not strict.
And, the only way to do it strictly is making a own scheduler, right ?

I have checked the source and I am not sure where to modify to do it.
What I understand is FairScheduler and others are for scheduling
multiple jobs. Is this right ?
What I want to do is scheduling tasks in one job.
This can be achieved by FairScheduler and others ?

Regards,
Hiroyuki

On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel
<michael_segel@hotmail.com> wrote:
> Mappers? Uhm... yes you can do it.
> Yes it is non-trivial.
> Yes, it is not recommended.
>
> I think we talk a bit about this in an InfoQ article written by Boris
> Lublinsky.
>
> Its kind of wild when your entire cluster map goes red in ganglia... :-)
>
>
> On Nov 28, 2012, at 2:41 AM, Harsh J <harsh@cloudera.com> wrote:
>
> Hi,
>
> Mapper scheduling is indeed influenced by the getLocations() returned
> results of the InputSplit.
>
> The map task itself does not care about deserializing the location
> information, as it is of no use to it. The location information is vital to
> the scheduler (or in 0.20.2, the JobTracker), where it is sent to directly
> when a job is submitted. The locations are used pretty well here.
>
> You should be able to control (or rather, influence) mapper placement by
> working with the InputSplits, but not strictly so, cause in the end its up
> to your MR scheduler to do data local or non data local assignments.
>
>
> On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada <mogwaing@gmail.com>
> wrote:
>>
>> Hi Harsh,
>>
>> Thank you for the information.
>> I understand the current circumstances.
>>
>> How about for mappers ?
>> As far as I tested, location information in InputSplit is ignored in
>> 0.20.2,
>> so there seems no easy way for assigning mappers to specific nodes.
>> (I before checked the source and noticed that
>> location information is not restored when deserializing the InputSplit
>> instance.)
>>
>> Thanks,
>> Hiroyuki
>>
>> On Wed, Nov 28, 2012 at 2:08 PM, Harsh J <harsh@cloudera.com> wrote:
>> > This is not supported/available currently even in MR2, but take a look
>> > at
>> > https://issues.apache.org/jira/browse/MAPREDUCE-199.
>> >
>> >
>> > On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada <mogwaing@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am wondering how I can assign reduce tasks to specific nodes.
>> >> What I want to do is, for example,  assigning reducer which produces
>> >> part-00000 to node xxx000,
>> >> and part-00001 to node xxx001 and so on.
>> >>
>> >> I think it's abount task assignment scheduling but
>> >> I am not sure where to customize to achieve this.
>> >> Is this done by writing some extensions ?
>> >> or any easier way to do this ?
>> >>
>> >> Regards,
>> >> Hiroyuki
>> >
>> >
>> >
>> >
>> > --
>> > Harsh J
>
>
>
>
> --
> Harsh J
>
>