hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sietse T. Au (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
Date Fri, 04 Apr 2014 15:26:15 GMT

     [ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sietse T. Au updated YARN-1902:
-------------------------------

    Description: 
Regarding AMRMClientImpl

Scenario 1:
Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with
x, allocate is called and at least one of the z allocated containers is started, then if another
addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers
will be allocated, where 1 container is expected.

Scenario 2:
No containers are started between the allocate calls. 

Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested
in both scenarios, but that only in the second scenario, the correct behavior is observed.

Looking at the implementation I have found that this (z+1) request is caused by the structure
of the remoteRequestsTable. The consequence of Map<Resource, ResourceRequestInfo> is
that ResourceRequestInfo does not hold any information about whether a request has been sent
to the RM yet or not.

There are workarounds for this, such as releasing the excess containers received.

The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo when
a request has been successfully sent to the RM.



  was:
Regarding AMRMClientImpl

Scenario 1:
Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with
x, allocate is called and at least one of the z allocated containers is started, then if another
addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers
will be allocated, where 1 container is expected.

Scenario 2:
This behavior does not occur when no containers are started between the allocate calls. 

Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested
in both scenarios, but that only in the second scenario, the correct behavior is observed.

Looking at the implementation I have found that this (z+1) request is caused by the structure
of the remoteRequestsTable. The consequence of Map<Resource, ResourceRequestInfo> is
that ResourceRequestInfo does not hold any information about whether a request has been sent
to the RM yet or not.

There are workarounds for this, such as releasing the excess containers received.

The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo when
a request has been successfully sent to the RM.




> Allocation of too many containers when a second request is done with the same resource
capability
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1902
>                 URL: https://issues.apache.org/jira/browse/YARN-1902
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Sietse T. Au
>              Labels: patch
>         Attachments: YARN-1902.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is called z times
with x, allocate is called and at least one of the z allocated containers is started, then
if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1)
containers will be allocated, where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested
in both scenarios, but that only in the second scenario, the correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused by the structure
of the remoteRequestsTable. The consequence of Map<Resource, ResourceRequestInfo> is
that ResourceRequestInfo does not hold any information about whether a request has been sent
to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers received.
> The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo
when a request has been successfully sent to the RM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message