hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lei Guo (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-3870) Providing raw container request information for fine scheduling
Date Tue, 30 Jun 2015 15:21:04 GMT
Lei Guo created YARN-3870:

             Summary: Providing raw container request information for fine scheduling
                 Key: YARN-3870
                 URL: https://issues.apache.org/jira/browse/YARN-3870
             Project: Hadoop YARN
          Issue Type: Sub-task
          Components: api, applications, capacityscheduler, fairscheduler, resourcemanager,
scheduler, yarn
            Reporter: Lei Guo

Currently, when AM sends container requests to RM and scheduler, it expands individual container
requests into host/rack/any format. For instance, if I am asking for container request with
preference "host1, host2, host3", assuming all are in the same rack rack1, instead of sending
one raw container request to RM/Scheduler with raw preference list, it basically expand it
to become 5 different objects with host1, host2, host3, rack1 and any in there. When scheduler
receives information, it basically already lost the raw request. This is ok for single container
request, but it will cause trouble when dealing with multiple container requests from the
same application. Consider this case:
6 hosts, two racks:
rack1 (host1, host2, host3) rack2 (host4, host5, host6)
When application requests two containers with different data locality preference:
c1: host1, host2, host4
c2: host2, host3, host5
This will end up with following container request list when client sending request to RM/Scheduler:
host1: 1 instance
host2: 2 instances
host3: 1 instance
host4: 1 instance
host5: 1 instance
rack1: 2 instances
rack2: 2 instances
any: 2 instances
Fundamentally, it is hard for scheduler to make a right judgement without knowing the raw
container request. The situation will get worse when dealing with affinity and anti-affinity
or even gang scheduling etc.

We need some way to provide raw container request information for fine scheduling purpose.

This message was sent by Atlassian JIRA

View raw message