Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B982BD06A for ; Mon, 8 Oct 2012 05:45:30 +0000 (UTC) Received: (qmail 68683 invoked by uid 500); 8 Oct 2012 05:45:26 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 68213 invoked by uid 500); 8 Oct 2012 05:45:21 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 68190 invoked by uid 99); 8 Oct 2012 05:45:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Oct 2012 05:45:20 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dechouxb@gmail.com designates 209.85.216.179 as permitted sender) Received: from [209.85.216.179] (HELO mail-qc0-f179.google.com) (209.85.216.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Oct 2012 05:45:15 +0000 Received: by mail-qc0-f179.google.com with SMTP id b14so2578819qcs.38 for ; Sun, 07 Oct 2012 22:44:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=/JZIre3vTCpV5ik4JngoiFtF6L/mDG1Q5DUtsE54z8E=; b=m5l3dI2L8Mercfqr3ZUlfFPFjCHmSkfvkm/l45v9Q9EUYlYjcI5dvtpHRBXB5DCdAz +luZsvQcyYEQSG9Kyaa+zDI2EznVRQJtnPDaTrwAclM2qSDjCG94mR4EnMU2ic4WC5iT WoyjsTpPDNgowMSOhaQ3t8qh5EfQ0HYDcHLdfuBplkyfGe5ElICUW1wz4HUzpPO0q6Sl VGLZJZ9z3kk9+xIt3TO+sg726I+9lRrddMhIfz2fA8a3DqMc24fJ90qGsTYg7myIFExP zDjF9csD8KRDe15EMnxgz41c8aUqdRmaPGjRg6fZDMUgoSnpxPHEVlvpda14/XGwPnnq 8R1w== MIME-Version: 1.0 Received: by 10.224.70.138 with SMTP id d10mr28271952qaj.12.1349675093772; Sun, 07 Oct 2012 22:44:53 -0700 (PDT) Received: by 10.49.71.231 with HTTP; Sun, 7 Oct 2012 22:44:53 -0700 (PDT) In-Reply-To: References: Date: Mon, 8 Oct 2012 07:44:53 +0200 Message-ID: Subject: Re: What is the difference between Rack-local map tasks and Data-local map tasks? From: Bertrand Dechoux To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec51a81ec86c87a04cb85b8b6 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec51a81ec86c87a04cb85b8b6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable @Harsh : I didn't know. That's good to hear. I will check out the mapred.fairscheduler.locality.delay in FairScheduler. And I will also look at YARN-80 for my personal information. Thanks! Bertrand On Mon, Oct 8, 2012 at 2:13 AM, Michael Segel wr= ote: > Ok, > > So what would be the use case for this feature? > > I mean when would locality take precedence over job time completion? > > On Oct 7, 2012, at 5:46 PM, Harsh J wrote: > > > Bertrand, > > > > FairScheduler does support delay scheduling for locality via > > mapred.fairscheduler.locality.delay config prop. MR2's > > CapacityScheduler recently got similar support for better locality > > scheduling as well (see YARN-80). Is this not what you're talking of? > > > > On Mon, Oct 8, 2012 at 1:01 AM, Bertrand Dechoux > wrote: > >> Basically, more replicas. > >> > >> The second solution would be to use a 'smarter' scheduler. In theory, > the > >> jobtracker should be able to say "postpone this task until a data-loca= l > task > >> can be created". But I don't think any stable and public available > scheduler > >> do that at the moment. This would allow you to have less traffic but t= he > >> whole job might be slower due to the wait. It might be a good trade if > you > >> have multiple jobs running at the same time and if your hot data is > >> uniformly distributed. But in practice this is of course not always th= e > case > >> and you also need to consider sla for the users so the whole is not > trivial. > >> > >> Regards > >> > >> Bertrand > >> > >> > >> On Sun, Oct 7, 2012 at 5:28 PM, centerqi hu wrote= : > >>> > >>> Very good explanation, > >>> If there is a way to reduce Rack-local map tasks > >>> but can increase the Data-local map tasks , > >>> Whether to increase performance=EF=BC=9F > >>> > >>> 2012/10/7 Michael Segel > >>>> > >>>> Rack local means that while the data isn't local to the node running > the > >>>> task, it is still on the same rack. > >>>> (Its meaningless unless you've set up rack awareness because all of > the > >>>> machines are on the default rack. ) > >>>> > >>>> Data local means that the task is running local to the machine that > >>>> contains the actual data. > >>>> > >>>> HTH > >>>> > >>>> -Mike > >>>> > >>>> On Oct 7, 2012, at 8:56 AM, centerqi hu wrote: > >>>> > >>>> > >>>> hi all > >>>> > >>>> When I run "hadoop job -status xxx",Output the following some list. > >>>> > >>>> Rack-local map tasks=3D124 > >>>> Data-local map tasks=3D6 > >>>> > >>>> What is the difference between Rack-local map tasks and Data-local m= ap > >>>> tasks? > >>>> > >>>> -- > >>>> centerqi@gmail.com|Sam > >>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> centerqi@gmail.com|=E9=BD=90=E5=BF=A0 > >> > >> > >> > >> > >> -- > >> Bertrand Dechoux > > > > > > > > -- > > Harsh J > > > > --=20 Bertrand Dechoux --bcaec51a81ec86c87a04cb85b8b6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable @Harsh : I didn't know. That's good to hear. I will check out the= =C2=A0mapred.fairscheduler.locality.delay in FairScheduler.
And I will = also look at=C2=A0YARN-80 for my personal information.

=
Thanks!

Bertrand

On Mon, Oct = 8, 2012 at 2:13 AM, Michael Segel <michael_segel@hotmail.com&g= t; wrote:
Ok,

So what would be the use case for this feature?

I mean when would locality take precedence over job time completion?

On Oct 7, 2012, at 5:46 PM, Harsh J <harsh@cloudera.com> wrote:

> Bertrand,
>
> FairScheduler does support delay scheduling for locality via
> mapred.fairscheduler.locality.delay config prop. MR2's
> CapacityScheduler recently got similar support for better locality
> scheduling as well (see YARN-80). Is this not what you're talking = of?
>
> On Mon, Oct 8, 2012 at 1:01 AM, Bertrand Dechoux <dechouxb@gmail.com> wrote:
>> Basically, more replicas.
>>
>> The second solution would be to use a 'smarter' scheduler.= In theory, the
>> jobtracker should be able to say "postpone this task until a = data-local task
>> can be created". But I don't think any stable and public = available scheduler
>> do that at the moment. This would allow you to have less traffic b= ut the
>> whole job might be slower due to the wait. It might be a good trad= e if you
>> have multiple jobs running at the same time and if your hot data i= s
>> uniformly distributed. But in practice this is of course not alway= s the case
>> and you also need to consider sla for the users so the whole is no= t trivial.
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Sun, Oct 7, 2012 at 5:28 PM, centerqi hu <centerqi@gmail.com> wrote:
>>>
>>> Very good explanation,
>>> If there is a way to reduce Rack-local map tasks
>>> but can increase the Data-local map tasks ,
>>> Whether to increase performance=EF=BC=9F
>>>
>>> 2012/10/7 Michael Segel <michael_segel@hotmail.com>
>>>>
>>>> Rack local means that while the data isn't local to th= e node running the
>>>> task, it is still on the same rack.
>>>> (Its meaningless unless you've set up rack awareness b= ecause all of the
>>>> machines are on the default rack. )
>>>>
>>>> Data local means that the task is running local to the mac= hine that
>>>> contains the actual data.
>>>>
>>>> HTH
>>>>
>>>> -Mike
>>>>
>>>> On Oct 7, 2012, at 8:56 AM, centerqi hu <centerqi@gmail.com> wrote:
>>>>
>>>>
>>>> hi all
>>>>
>>>> When I run "hadoop job -status xxx",Output the f= ollowing some list.
>>>>
>>>> Rack-local map tasks=3D124
>>>> Data-local map tasks=3D6
>>>>
>>>> What is the difference between Rack-local map tasks and Da= ta-local map
>>>> tasks?
>>>>
>>>> --
>>>> centerqi@gmail.com|Sam
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>>
centerqi@gmail.com|= =E9=BD=90=E5=BF=A0
>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>
>
>
> --
> Harsh J
>




--
= Bertrand Dechoux
--bcaec51a81ec86c87a04cb85b8b6--