hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: What is the difference between Rack-local map tasks and Data-local map tasks?
Date Mon, 08 Oct 2012 00:13:15 GMT
Ok, 

So what would be the use case for this feature?

I mean when would locality take precedence over job time completion? 

On Oct 7, 2012, at 5:46 PM, Harsh J <harsh@cloudera.com> wrote:

> Bertrand,
> 
> FairScheduler does support delay scheduling for locality via
> mapred.fairscheduler.locality.delay config prop. MR2's
> CapacityScheduler recently got similar support for better locality
> scheduling as well (see YARN-80). Is this not what you're talking of?
> 
> On Mon, Oct 8, 2012 at 1:01 AM, Bertrand Dechoux <dechouxb@gmail.com> wrote:
>> Basically, more replicas.
>> 
>> The second solution would be to use a 'smarter' scheduler. In theory, the
>> jobtracker should be able to say "postpone this task until a data-local task
>> can be created". But I don't think any stable and public available scheduler
>> do that at the moment. This would allow you to have less traffic but the
>> whole job might be slower due to the wait. It might be a good trade if you
>> have multiple jobs running at the same time and if your hot data is
>> uniformly distributed. But in practice this is of course not always the case
>> and you also need to consider sla for the users so the whole is not trivial.
>> 
>> Regards
>> 
>> Bertrand
>> 
>> 
>> On Sun, Oct 7, 2012 at 5:28 PM, centerqi hu <centerqi@gmail.com> wrote:
>>> 
>>> Very good explanation,
>>> If there is a way to reduce Rack-local map tasks
>>> but can increase the Data-local map tasks ,
>>> Whether to increase performance?
>>> 
>>> 2012/10/7 Michael Segel <michael_segel@hotmail.com>
>>>> 
>>>> Rack local means that while the data isn't local to the node running the
>>>> task, it is still on the same rack.
>>>> (Its meaningless unless you've set up rack awareness because all of the
>>>> machines are on the default rack. )
>>>> 
>>>> Data local means that the task is running local to the machine that
>>>> contains the actual data.
>>>> 
>>>> HTH
>>>> 
>>>> -Mike
>>>> 
>>>> On Oct 7, 2012, at 8:56 AM, centerqi hu <centerqi@gmail.com> wrote:
>>>> 
>>>> 
>>>> hi all
>>>> 
>>>> When I run "hadoop job -status xxx",Output the following some list.
>>>> 
>>>> Rack-local map tasks=124
>>>> Data-local map tasks=6
>>>> 
>>>> What is the difference between Rack-local map tasks and Data-local map
>>>> tasks?
>>>> 
>>>> --
>>>> centerqi@gmail.com|Sam
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> centerqi@gmail.com|齐忠
>> 
>> 
>> 
>> 
>> --
>> Bertrand Dechoux
> 
> 
> 
> -- 
> Harsh J
> 


Mime
View raw message