Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Sun, 3 Jan 2016 07:01:39 +0000 (UTC)
From: "Wangda Tan (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12841719.1435677632000.3795.1451804499954@Atlassian.JIRA>
In-Reply-To: <JIRA.12841719.1435677632000@Atlassian.JIRA>
References: <JIRA.12841719.1435677632000@Atlassian.JIRA>
 <JIRA.12841719.1435677632777@arcas>
Subject: [jira] [Commented] (YARN-3870) Providing raw container request
 information for fine scheduling
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076775#comment-15076775 ] 

Wangda Tan commented on YARN-3870:
----------------------------------

[~kasha],

bq. Was fleshing this out further. The number of IDs and hence ResourceRequests could be O(num. outstanding containers) which could pose problems as outlined in YARN-371. In fact, this JIRA is a duplicate of YARN-371: may be, we should close this and continue the discussion there.

As I mentioned above, I think we shouldn't combine YARN-371 and YARN-4485 together: YARN-4485 is more like an internal change of scheduler to me:

Let's say an AM originally requests 1000 container (T1), then AM requests 1200 containers (T2), then after scheduler allocated 100 containers, AM requests 1200 containers again (T3).

For the original request, scheduler records: T1, 1000.
After T2, scheduler records: T1, 1000; T2, 200.
After T3, scheduler records: T1, 900 (scheduler allocates 100 containers); T2, 200; T3, 100.

Instead recording timestamps for all resource requests, AM only needs to record timestamp to #pending-requests. And scheduler will "dequeue" from the timestamp to #pending-requests (sorted by time) when container allocated.

Like what you said, it will be hard to ask AM to set the ID, but scheduler should easily set it. But this solution needs more work if we want to save these timestamps when RM restart.

> Providing raw container request information for fine scheduling
> ---------------------------------------------------------------
>
>                 Key: YARN-3870
>                 URL: https://issues.apache.org/jira/browse/YARN-3870
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, applications, capacityscheduler, fairscheduler, resourcemanager, scheduler, yarn
>            Reporter: Lei Guo
>            Assignee: Karthik Kambatla
>
> Currently, when AM sends container requests to RM and scheduler, it expands individual container requests into host/rack/any format. For instance, if I am asking for container request with preference "host1, host2, host3", assuming all are in the same rack rack1, instead of sending one raw container request to RM/Scheduler with raw preference list, it basically expand it to become 5 different objects with host1, host2, host3, rack1 and any in there. When scheduler receives information, it basically already lost the raw request. This is ok for single container request, but it will cause trouble when dealing with multiple container requests from the same application. Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without knowing the raw container request. The situation will get worse when dealing with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine scheduling purpose.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)