hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Prakash <ravi...@ymail.com>
Subject Re: YARN replica selection
Date Sat, 20 Jun 2015 20:43:40 GMT
Hi Muthu!

Hitesh is correct. The behavior is application specific in the sense 
that its the application AM which asks for containers. Look at 

for MapReduce's behavior.

The Yarn ResourceManager's scheduler (e.g. Capacity / Fair) will then 
decide based on the resource requests. Here's some code if you want to 
read it 

On 06/19/15 09:02, Hitesh Shah wrote:
> Moving conversation to yarn-dev. BCC’ed hdfs-dev.
> YARN actually does not do anything except give back containers based on what an application
requested for. It is up to each and every application to first figure out where the data is
located and then make optimal choices based on which node to prefer for scheduling. I believe
MapReduce has some changes to use potentially memory-based block locations over disk-based
ones but I don’t believe there is any significant work in any YARN application that makes
cost-based decisions based on the various storage types of where blocks are available.
> thanks
> — Hitesh
> On Jun 19, 2015, at 12:33 AM, Muthu Ganesh <mutgan7@gmail.com> wrote:
>> Hi,
>> How does YARN decide which replica to use when scheduling a task or is it
>> random?
>> Does the YARN scheduler give a priority to SSD storage types over DISK
>> storage types for the HOT_STORAGE_POLICY when scheduling data local tasks?
>> Please let me know if this should be posted in YARN developers mailing list
>> instead.
>> Thanks.
>> Muthu

View raw message