hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1200) Provide a central view for rack topologies
Date Mon, 16 Sep 2013 11:07:53 GMT

    [ https://issues.apache.org/jira/browse/YARN-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768228#comment-13768228
] 

Junping Du commented on YARN-1200:
----------------------------------

bq. A reasonable regression-fixing first step is to match that of the HDFS functionality:
Each NameNode (and NameNode alone) needs the rack resolution script, not all the DNs.
Right. I also agree this is the first step we should address. However, I think this is not
a simple fix that we should quickly make a decision. The reason is as below:
Yarn, unlike MRV1, actually separate two layers of scheduling (resource scheduling and task
scheduling) out for simplicity: For resource scheduling, AM translate tasks (TaskAttempt)
from Job into ContainerRequest and later encode to ResourceRequests and send to RM scheduler
for container allocation. For task scheduling, AM schedule map tasks based on locality on
allocated containers. For the first step, the AM resolve topology to create several (typically
3: node, rack and ANY/*) ResourceRequest from one ContainerRequest asked for specific nodes.
The second step is decided by AM only which means AM has to understand the topology at least
for some specific nodes. Although we may get rid of resolving topology in the first step by
simplifying ResourceRequest between AM and RM, i.e. sending only 1 (node) ResourceRequest
instead of 3 RRs, we still need to resolve it in second step.
Given we cannot get rid of resolve topology in AM, we may prefer cache instead of running
the script on each node. I have a draft proposal below:
- Setup a cache <node, network_location> in AM
- In response come back from AMRM heartbeat, AM can get topology info on related nodes (nodes
in request and assignedContainer) and add into cache 
- Remove all RackResolver call in AM side with accessing cache
- If cache missing in sending node's ResourceRequest for rack, resolve its rack location to
something unusual (like:UNKNOWN), then RM will replace with correct rack info. Heartbeat back
will fill the cache later.
- The cache is not only be filled but also be refreshed. So if topology changes and aware
in RM side, the changes may not updated immediately to all AMs, but gradually be updated to
related AMs (request node's resource or get assigned containers).
Thoughts?  

                
> Provide a central view for rack topologies
> ------------------------------------------
>
>                 Key: YARN-1200
>                 URL: https://issues.apache.org/jira/browse/YARN-1200
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Harsh J
>
> It appears that with YARN, any AM (such as the MRv2 AM) that tries to do rack-info-based
work, will need to resolve racks locally rather than get rack info from YARN directly: https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#L1054
and its use of a simple implementation of https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RackResolver.java
> This is a regression, as we've traditionally only had users maintain rack mappings and
its associated script on a single master role node (JobTracker), not at every compute node.
Task spawning hosts have never done/needed rack resolution of their own.
> It is silly to have to maintain rack configs and their changes on all nodes. We should
have the RM host a stable interface service so that there's only a single view of the topology
across the cluster, and document for AMs to use that instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message