hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subru Krishnan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
Date Tue, 05 Jan 2016 03:28:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082312#comment-15082312
] 

Subru Krishnan commented on YARN-3870:
--------------------------------------

Regarding the ID, I am in principle fine with asking the AM to set it. We do have the option
of reusing the _responseID_ of *AllocateRequest* which both the RM and AM maintain today.
It would be good to also link the _responseID_ to the actual allocated container in *AllocateResponse*
as this is a useful hint for the AMs. In fact has been requested by [~markus.weimer] to simplify
certain bookkeeping for the [REEF | http://reef.apache.org/ ] AM.

> Providing raw container request information for fine scheduling
> ---------------------------------------------------------------
>
>                 Key: YARN-3870
>                 URL: https://issues.apache.org/jira/browse/YARN-3870
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, applications, capacityscheduler, fairscheduler, resourcemanager,
scheduler, yarn
>            Reporter: Lei Guo
>            Assignee: Karthik Kambatla
>
> Currently, when AM sends container requests to RM and scheduler, it expands individual
container requests into host/rack/any format. For instance, if I am asking for container request
with preference "host1, host2, host3", assuming all are in the same rack rack1, instead of
sending one raw container request to RM/Scheduler with raw preference list, it basically expand
it to become 5 different objects with host1, host2, host3, rack1 and any in there. When scheduler
receives information, it basically already lost the raw request. This is ok for single container
request, but it will cause trouble when dealing with multiple container requests from the
same application. Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending request to
RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without knowing the
raw container request. The situation will get worse when dealing with affinity and anti-affinity
or even gang scheduling etc.
> We need some way to provide raw container request information for fine scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message