Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EAE64DB5F for ; Wed, 28 Nov 2012 15:47:26 +0000 (UTC) Received: (qmail 52169 invoked by uid 500); 28 Nov 2012 15:47:22 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 52042 invoked by uid 500); 28 Nov 2012 15:47:22 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 52030 invoked by uid 99); 28 Nov 2012 15:47:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2012 15:47:21 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.86 as permitted sender) Received: from [65.55.111.86] (HELO blu0-omc2-s11.blu0.hotmail.com) (65.55.111.86) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2012 15:47:13 +0000 Received: from BLU0-SMTP106 ([65.55.111.71]) by blu0-omc2-s11.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 28 Nov 2012 07:46:52 -0800 X-Originating-IP: [173.15.87.37] X-EIP: [K7eOWbfebS8Nfkyqk7tqg9OJF0Z0JXcd] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [192.168.0.104] ([173.15.87.37]) by BLU0-SMTP106.blu0.hotmail.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Wed, 28 Nov 2012 07:46:50 -0800 From: Michael Segel Content-Type: multipart/alternative; boundary="Apple-Mail=_E90A1B2B-7F15-4B02-A781-4E6457984BA8" MIME-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Assigning reduce tasks to specific nodes Date: Wed, 28 Nov 2012 09:46:48 -0600 References: To: user@hadoop.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-OriginalArrivalTime: 28 Nov 2012 15:46:50.0732 (UTC) FILETIME=[94F642C0:01CDCD7F] X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_E90A1B2B-7F15-4B02-A781-4E6457984BA8 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" Mappers? Uhm... yes you can do it. Yes it is non-trivial.=20 Yes, it is not recommended.=20 I think we talk a bit about this in an InfoQ article written by Boris = Lublinsky.=20 Its kind of wild when your entire cluster map goes red in ganglia... :-) On Nov 28, 2012, at 2:41 AM, Harsh J wrote: > Hi, >=20 > Mapper scheduling is indeed influenced by the getLocations() returned = results of the InputSplit. >=20 > The map task itself does not care about deserializing the location = information, as it is of no use to it. The location information is vital = to the scheduler (or in 0.20.2, the JobTracker), where it is sent to = directly when a job is submitted. The locations are used pretty well = here. >=20 > You should be able to control (or rather, influence) mapper placement = by working with the InputSplits, but not strictly so, cause in the end = its up to your MR scheduler to do data local or non data local = assignments. >=20 >=20 > On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada = wrote: > Hi Harsh, >=20 > Thank you for the information. > I understand the current circumstances. >=20 > How about for mappers ? > As far as I tested, location information in InputSplit is ignored in = 0.20.2, > so there seems no easy way for assigning mappers to specific nodes. > (I before checked the source and noticed that > location information is not restored when deserializing the InputSplit > instance.) >=20 > Thanks, > Hiroyuki >=20 > On Wed, Nov 28, 2012 at 2:08 PM, Harsh J wrote: > > This is not supported/available currently even in MR2, but take a = look at > > https://issues.apache.org/jira/browse/MAPREDUCE-199. > > > > > > On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada = wrote: > >> > >> Hi, > >> > >> I am wondering how I can assign reduce tasks to specific nodes. > >> What I want to do is, for example, assigning reducer which = produces > >> part-00000 to node xxx000, > >> and part-00001 to node xxx001 and so on. > >> > >> I think it's abount task assignment scheduling but > >> I am not sure where to customize to achieve this. > >> Is this done by writing some extensions ? > >> or any easier way to do this ? > >> > >> Regards, > >> Hiroyuki > > > > > > > > > > -- > > Harsh J >=20 >=20 >=20 > --=20 > Harsh J --Apple-Mail=_E90A1B2B-7F15-4B02-A781-4E6457984BA8 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="iso-8859-1" harsh@cloudera.com> = wrote:
Hi,

Mapper scheduling is indeed = influenced by the getLocations() returned results of the = InputSplit.

The map task itself does not care = about deserializing the location information, as it is of no use to it. = The location information is vital to the scheduler (or in 0.20.2, the = JobTracker), where it is sent to directly when a job is submitted. The = locations are used pretty well here.

You should be able to control (or rather, = influence) mapper placement by working with the InputSplits, but not = strictly so, cause in the end its up to your MR scheduler to do data = local or non data local assignments.


On Wed, = Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada <mogwaing@gmail.com> wrote:
Hi Harsh,

Thank you for the information.
I understand the current circumstances.

How about for mappers ?
As far as I tested, location information in InputSplit is ignored in = 0.20.2,
so there seems no easy way for assigning mappers to specific nodes.
(I before checked the source and noticed that
location information is not restored when deserializing the = InputSplit
instance.)

Thanks,
Hiroyuki

On Wed, Nov 28, 2012 at 2:08 PM, Harsh J <harsh@cloudera.com> wrote:
> This is not supported/available currently even in MR2, but take a = look at
> https://issues.apache.org/jira/browse/MAPREDUCE-199.=
>
>
> On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada <mogwaing@gmail.com> wrote:
>>
>> Hi,
>>
>> I am wondering how I can assign reduce tasks to specific = nodes.
>> What I want to do is, for example,  assigning reducer = which produces
>> part-00000 to node xxx000,
>> and part-00001 to node xxx001 and so on.
>>
>> I think it's abount task assignment scheduling but
>> I am not sure where to customize to achieve this.
>> Is this done by writing some extensions ?
>> or any easier way to do this ?
>>
>> Regards,
>> Hiroyuki
>
>
>
>
> --
> Harsh J



-- =
Harsh J

= --Apple-Mail=_E90A1B2B-7F15-4B02-A781-4E6457984BA8--