hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subru Krishnan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
Date Wed, 23 Dec 2015 20:50:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070178#comment-15070178
] 

Subru Krishnan commented on YARN-3870:
--------------------------------------

+1 on this.

Thanks [~grey] for raising this. I have been having offline discussions with [~asuresh] and
[~curino] around Distributed Scheduling (YARN-2877) and Federation (YARN-2915). In both scenarios,
sending the raw container request and letting the RM expand will save us a lot of pain as
currently we are finding it very difficult to route requests correctly in the AMRMProxy (YARN-2844)


> Providing raw container request information for fine scheduling
> ---------------------------------------------------------------
>
>                 Key: YARN-3870
>                 URL: https://issues.apache.org/jira/browse/YARN-3870
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, applications, capacityscheduler, fairscheduler, resourcemanager,
scheduler, yarn
>            Reporter: Lei Guo
>
> Currently, when AM sends container requests to RM and scheduler, it expands individual
container requests into host/rack/any format. For instance, if I am asking for container request
with preference "host1, host2, host3", assuming all are in the same rack rack1, instead of
sending one raw container request to RM/Scheduler with raw preference list, it basically expand
it to become 5 different objects with host1, host2, host3, rack1 and any in there. When scheduler
receives information, it basically already lost the raw request. This is ok for single container
request, but it will cause trouble when dealing with multiple container requests from the
same application. Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending request to
RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without knowing the
raw container request. The situation will get worse when dealing with affinity and anti-affinity
or even gang scheduling etc.
> We need some way to provide raw container request information for fine scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message