hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data
Date Thu, 03 Nov 2016 12:59:58 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632657#comment-15632657

Hadoop QA commented on MAPREDUCE-6803:

| (x) *{color:red}-1 overall{color}* |
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue}
Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} | {color:red}
MAPREDUCE-6803 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute
for help. {color} |
|| Subsystem || Report/Notes ||
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12742477/YARN-3856.002.patch
| JIRA Issue | MAPREDUCE-6803 |
| Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6795/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |

This message was automatically generated.

> MR AppMaster should assign container that is closest to the data
> ----------------------------------------------------------------
>                 Key: MAPREDUCE-6803
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6803
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster
>         Environment: Hadoop cluster with multi-level network hierarchy
>            Reporter: jaehoon ko
>              Labels: oct16-medium
>         Attachments: YARN-3856.001.patch, YARN-3856.002.patch
> Currently, given a Container request for a host, ResourceManager allocates a Container
with following priorities (RMContainerAllocator.java):
>  - Requested host
>  - a host in the same rack as the requested host
>  - any host
> This can lead to a sub-optimal allocation if Hadoop cluster is deployed on multi-level
networked hosts (which is typical). For example, let's suppose a network architecture with
one core switches, two aggregate switches, four ToR switches, and 8 hosts. Each switch has
two downlinks. Rack IDs of hosts are as follows:
> h1, h2: /c/a1/t1
> h3, h4: /c/a1/t2
> h5, h6: /c/a2/t3
> h7, h8: /c/a2/t4
> To allocate a container for data in h1, Hadoop first tries h1 itself, then h2, then any
of h3 ~ h8. Clearly, h3 or h4 are better than h5~h8 in terms of network distance and bandwidth.
However, current implementation choose one from h3~h8 with equal probabilities.
> This limitation is more obvious when considering hadoop clusters deployed on VM or containers.
In this case, only the VMs or containers running in the same physical host are considered
rack local, and actual rack-local hosts are chosen with same probabilities as far hosts.
> The root cause of this limitation is that RMContainerAllocator.java performs exact matching
on rack id to find a rack local host. Alternatively, we can perform longest-prefix matching
to find a closest host. Using the same network architecture as above, with longest-prefix
matching, hosts are selected with the following priorities:
>  h1
>  h2
>  h3 or h4
>  h5 or h6 or h7 or h8

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message