Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Sat, 16 May 2015 00:16:01 +0000 (UTC)
From: "Bikas Saha (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12706637.1396623097000.132843.1431735361861@Atlassian.JIRA>
In-Reply-To: <JIRA.12706637.1396623097000@Atlassian.JIRA>
References: <JIRA.12706637.1396623097000@Atlassian.JIRA>
 <JIRA.12706637.1396623097659@arcas>
Subject: [jira] [Commented] (YARN-1902) Allocation of too many containers
 when a second request is done with the same resource capability
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546421#comment-14546421 ] 

Bikas Saha commented on YARN-1902:
----------------------------------

Yes. And then the RM may give a container on H1 which is not useful for the app. If we again auto-decrement and release the container then we end up with 2 outstanding requests and the job will hang because it needs 3 containers.

> Allocation of too many containers when a second request is done with the same resource capability
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1902
>                 URL: https://issues.apache.org/jira/browse/YARN-1902
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.2.0, 2.3.0, 2.4.0
>            Reporter: Sietse T. Au
>            Assignee: Sietse T. Au
>              Labels: client
>         Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with x, allocate is called and at least one of the z allocated containers is started, then if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers will be allocated, where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested in both scenarios, but that only in the second scenario, the correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused by the structure of the remoteRequestsTable. The consequence of Map<Resource, ResourceRequestInfo> is that ResourceRequestInfo does not hold any information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers received.
> The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)