hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: Yarn related questions:
Date Fri, 06 Jan 2012 18:45:26 GMT
Responses inline:

On Jan 6, 2012, at 9:34 AM, Ann Pal wrote:

> Thanks for your reply. Some additional questions:
> [1] How does the application master determine the size (memory requirement) of the container
 ? Can the container viewed as a JVM with CPU, memory?

Pretty much. It's related to the size of the JVM or any Unix process you want to run.

> [2] The document, mentions a concept of fungibility of resources across servers. An allocated
container of 2 GB of RAM for a reducer could be across two servers of 1GB each.  If so a task
is split across 2 servers? Not sure how that works.

It means 'fungibility' across map and reduce tasks i.e. there is no more fixed map/reduce
slots. A container can't be split across servers.

> [3] The application master corresponds to Job Tracker for a given job, and Node Manager
corresponds to task tracker  in  pre 0.23 hadoop. Is this assumption correct?

Pretty much. Except that the AM doesn't do any resource mgmt done by the JT, that's done by
the ResourceManager.

> [4] For data to be transferred from map->reduce node, is it the reduce node "node
manager" who periodically polls the application master, and subsequently pulls map data from
the completed map nodes?

No, the reduce task itself fetches map outputs.

The reduce tasks polls AM to get information about 'where' map outputs are available.


View raw message