hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: What is the difference between Rack-local map tasks and Data-local map tasks?
Date Sun, 07 Oct 2012 22:46:50 GMT
Bertrand,

FairScheduler does support delay scheduling for locality via
mapred.fairscheduler.locality.delay config prop. MR2's
CapacityScheduler recently got similar support for better locality
scheduling as well (see YARN-80). Is this not what you're talking of?

On Mon, Oct 8, 2012 at 1:01 AM, Bertrand Dechoux <dechouxb@gmail.com> wrote:
> Basically, more replicas.
>
> The second solution would be to use a 'smarter' scheduler. In theory, the
> jobtracker should be able to say "postpone this task until a data-local task
> can be created". But I don't think any stable and public available scheduler
> do that at the moment. This would allow you to have less traffic but the
> whole job might be slower due to the wait. It might be a good trade if you
> have multiple jobs running at the same time and if your hot data is
> uniformly distributed. But in practice this is of course not always the case
> and you also need to consider sla for the users so the whole is not trivial.
>
> Regards
>
> Bertrand
>
>
> On Sun, Oct 7, 2012 at 5:28 PM, centerqi hu <centerqi@gmail.com> wrote:
>>
>> Very good explanation,
>> If there is a way to reduce Rack-local map tasks
>> but can increase the Data-local map tasks ,
>> Whether to increase performance?
>>
>> 2012/10/7 Michael Segel <michael_segel@hotmail.com>
>>>
>>> Rack local means that while the data isn't local to the node running the
>>> task, it is still on the same rack.
>>> (Its meaningless unless you've set up rack awareness because all of the
>>> machines are on the default rack. )
>>>
>>> Data local means that the task is running local to the machine that
>>> contains the actual data.
>>>
>>> HTH
>>>
>>> -Mike
>>>
>>> On Oct 7, 2012, at 8:56 AM, centerqi hu <centerqi@gmail.com> wrote:
>>>
>>>
>>> hi all
>>>
>>> When I run "hadoop job -status xxx",Output the following some list.
>>>
>>> Rack-local map tasks=124
>>> Data-local map tasks=6
>>>
>>> What is the difference between Rack-local map tasks and Data-local map
>>> tasks?
>>>
>>> --
>>> centerqi@gmail.com|Sam
>>>
>>>
>>
>>
>>
>> --
>> centerqi@gmail.com|齐忠
>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Mime
View raw message