hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: What is the difference between Rack-local map tasks and Data-local map tasks?
Date Mon, 08 Oct 2012 05:44:53 GMT
@Harsh : I didn't know. That's good to hear. I will check out
the mapred.fairscheduler.locality.delay in FairScheduler.
And I will also look at YARN-80 for my personal information.

Thanks!

Bertrand

On Mon, Oct 8, 2012 at 2:13 AM, Michael Segel <michael_segel@hotmail.com>wrote:

> Ok,
>
> So what would be the use case for this feature?
>
> I mean when would locality take precedence over job time completion?
>
> On Oct 7, 2012, at 5:46 PM, Harsh J <harsh@cloudera.com> wrote:
>
> > Bertrand,
> >
> > FairScheduler does support delay scheduling for locality via
> > mapred.fairscheduler.locality.delay config prop. MR2's
> > CapacityScheduler recently got similar support for better locality
> > scheduling as well (see YARN-80). Is this not what you're talking of?
> >
> > On Mon, Oct 8, 2012 at 1:01 AM, Bertrand Dechoux <dechouxb@gmail.com>
> wrote:
> >> Basically, more replicas.
> >>
> >> The second solution would be to use a 'smarter' scheduler. In theory,
> the
> >> jobtracker should be able to say "postpone this task until a data-local
> task
> >> can be created". But I don't think any stable and public available
> scheduler
> >> do that at the moment. This would allow you to have less traffic but the
> >> whole job might be slower due to the wait. It might be a good trade if
> you
> >> have multiple jobs running at the same time and if your hot data is
> >> uniformly distributed. But in practice this is of course not always the
> case
> >> and you also need to consider sla for the users so the whole is not
> trivial.
> >>
> >> Regards
> >>
> >> Bertrand
> >>
> >>
> >> On Sun, Oct 7, 2012 at 5:28 PM, centerqi hu <centerqi@gmail.com> wrote:
> >>>
> >>> Very good explanation,
> >>> If there is a way to reduce Rack-local map tasks
> >>> but can increase the Data-local map tasks ,
> >>> Whether to increase performance?
> >>>
> >>> 2012/10/7 Michael Segel <michael_segel@hotmail.com>
> >>>>
> >>>> Rack local means that while the data isn't local to the node running
> the
> >>>> task, it is still on the same rack.
> >>>> (Its meaningless unless you've set up rack awareness because all of
> the
> >>>> machines are on the default rack. )
> >>>>
> >>>> Data local means that the task is running local to the machine that
> >>>> contains the actual data.
> >>>>
> >>>> HTH
> >>>>
> >>>> -Mike
> >>>>
> >>>> On Oct 7, 2012, at 8:56 AM, centerqi hu <centerqi@gmail.com> wrote:
> >>>>
> >>>>
> >>>> hi all
> >>>>
> >>>> When I run "hadoop job -status xxx",Output the following some list.
> >>>>
> >>>> Rack-local map tasks=124
> >>>> Data-local map tasks=6
> >>>>
> >>>> What is the difference between Rack-local map tasks and Data-local map
> >>>> tasks?
> >>>>
> >>>> --
> >>>> centerqi@gmail.com|Sam
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> centerqi@gmail.com|齐忠
> >>
> >>
> >>
> >>
> >> --
> >> Bertrand Dechoux
> >
> >
> >
> > --
> > Harsh J
> >
>
>


-- 
Bertrand Dechoux

Mime
View raw message